-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathrobots.txt
More file actions
78 lines (76 loc) · 2.22 KB
/
robots.txt
File metadata and controls
78 lines (76 loc) · 2.22 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# The following lines list known AI, data-scraping, and plagiarism-checking crawlers.
# They are disallowed to prevent content from being used for AI training and to avoid
# incorrect plagiarism flags on original academic and written work.
User-agent: AnthropicAI
User-agent: TurnitinBot
User-agent: UnicheckBot
User-agent: PlagScan
User-agent: PlagTracker
User-agent: QueText
User-agent: Plagiarisma
User-agent: Copyscape
User-agent: Scribbr-bot
User-agent: DrillBitBot
User-agent: OpenAI
User-agent: Sogou
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: ia_archiver
User-agent: AI2Bot
User-agent: Ai2Bot-Dolma
User-agent: Amazonbot
User-agent: anthropic-ai
User-agent: Applebot
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: cohere-ai
User-agent: cohere-training-data-crawler
User-agent: Crawlspace
User-agent: Diffbot
User-agent: DuckAssistBot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: GPTBot
User-agent: iaskspider/2.0
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: ISSCyberRiskCrawler
User-agent: Kangaroo Bot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: omgili
User-agent: omgilibot
User-agent: PanguBot
User-agent: PerplexityBot
User-agent: Perplexity-User-Agent
User-agent: PetalBot
User-agent: Scrapy
User-agent: SemrushBot-OCOB
User-agent: SemrushBot-SWA
User-agent: Sidetrade indexer bot
User-agent: Timpibot
User-agent: Seekr
User-agent: VelenPublicWebCrawler
User-agent: Webzio-Extended
User-agent: YouBot
User-agent: yandex
Disallow: /
# The following lines explicitly allow all other user agents to crawl the entire site.
# This is the default behavior, but it's good practice to be explicit.
# The _assets directory is disallowed to prevent indexing of style, script, and image files.
# The Setup directory is also disallowed as it contains build-related files.
User-agent: *
# Disallow crawling of the assets directory
Disallow: /assets/
# Disallow crawling of the setup/build directory
Disallow: /deployment/