Unfortunately, I’ve got case-insensitive servers that cannot be replaced in the short term. Some directories need to be excluded from crawling, so I have to Disallow
them in my robots.txt
. Let’s take /Img/
as example. If I keep it all lower case…
User-agent: *
Disallow: /img/
… it does not map to the actual physical path, and addresses with /Img/
or /IMG/
are not applied the Disallow
directive. Crawlers will treat these variations as distinct paths.
It’s fun to look at Microsoft’s robots.txt in this matter. They probably use IIS servers, and SERPs are just full of disallowed addresses–only with other cases.
What can I do?
Is it valid (and effectual) to state the following?
User-agent: *
Disallow: /Img/
Disallow: /img/
Disallow: /IMG/
Disallow all robots access to everything. All Google bots don't have access. All Google bots, except for Googlebot news don't have access. Googlebot and Slurp don't have any access.
Order of precedence for user agents Other groups are ignored. All non-matching text is ignored (for example, both googlebot/1.2 and googlebot* are equivalent to googlebot ). The order of the groups within the robots. txt file is irrelevant.
The Robot Exclusion Standard is purely advisory, it's completely up to you if you follow it or not, and if you aren't doing something nasty chances are that nothing will happen if you choose to ignore it.
The original robots.txt specification doesn't say anything about typecase in file paths, but according to Google's robots.txt specification, file paths are definitely case-sensitive. Google clearly states that "Disallow: /img/" only blocks "/img/", not "/Img/" or "/IMG/". Your solution is definitely valid, and will solve the problem.
That being said, I would only resort to this solution if I had reason to believe the alternate-case URLs were actually being crawled, and they were causing a problem. You can easily turn your robots.txt file into an unmaintainable mess otherwise.
As the Disallow
field takes (beginnings of) URL paths as value, and URL paths are case-sensitive, your assumption is correct.
So yes, if you want to block all URLs whose paths start with case-insensitive /img
/, you’d need to add:
Disallow: /img/
Disallow: /IMG/
Disallow: /Img/
Disallow: /IMg/
Disallow: /ImG/
Disallow: /iMg/
Disallow: /iMG/
Disallow: /imG/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With