In robots.txt file I have following sections
User-Agent: Bot1
Disallow: /A
User-Agent: Bot2
Disallow: /B
User-Agent: *
Disallow: /C
Will statement Disallow:c
be visible to Bot1 & Bot2 ?
user-agent : identifies which crawler the rules apply to. allow : a URL path that may be crawled. disallow : a URL path that may not be crawled. sitemap : the complete URL of a sitemap.
You can use it to prevent search engines from crawling specific parts of your website and to give search engines helpful tips on how they can best crawl your website. The robots. txt file plays a big role in SEO.
Google announced back in 2019 that the robots. txt to block indexing would no longer be honored.
Rules in the robots. txt file are case-sensitive. In this case, it is recommended to make sure that only one version of the URL is indexed using canonicalization methods.
tl;dr: No, Bot1 and Bot2 will happily crawl paths starting with C
.
Each bot only ever complies to at most a single record (block).
In the original specification it says:
If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records.
The original spec, including some additions (like Allow
) became a draft for RFC, but never got accepted/published. In 3.2.1 The User-agent line it says:
The robot must obey the first record in /robots.txt that contains a User-Agent line whose value contains the name token of the robot as a substring. The name comparisons are case-insensitive. If no such record exists, it should obey the first record with a User-agent line with a "*" value, if present. If no record satisfied either condition, or no records are present at all, access is unlimited.
So it confirms the interpretation of the original spec.
Google, for example, gives an example that seems to follow the spec:
Each section in the robots.txt file is separate and does not build upon previous sections. For example:
User-agent: * Disallow: /folder1/ User-Agent: Googlebot Disallow: /folder2/
In this example only the URLs matching
/folder2/
would be disallowed for Googlebot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With