I want to allow crawlers to access my domain's root directory (i.e. the index.html file), but nothing deeper (i.e. no subdirectories). I do not want to have to list and deny every subdirectory individually within the robots.txt file. Currently I have the following, but I think it is blocking everything, including stuff in the domain's root.
User-agent: *
Allow: /$
Disallow: /
How can I write my robots.txt to accomplish what I am trying for?
Thanks in advance!
Useful robots.txt rules Append a forward slash to the directory name to disallow crawling of a whole directory. Caution: Remember, don't use robots.txt to block access to private content; use proper authentication instead.
You do that with two core commands: User-agent – this lets you target specific bots. User agents are what bots use to identify themselves. With them, you could, for example, create a rule that applies to Bing, but not to Google. Disallow – this lets you tell robots not to access certain areas of your site.
To fix this issue, move your robots. txt file to your root directory. It's worth noting that this will need you to have root access to your server. Some content management systems will upload files to a 'media' subdirectory (or something similar) by default, so you might need to circumvent this to get your robots.
There's nothing that will work for all crawlers. There are two options that might be useful to you.
Robots that allow wildcards should support something like:
Disallow: /*/
The major search engine crawlers understand the wildcards, but unfortunately most of the smaller ones don't.
If you have relatively few files in the root and you don't often add new files, you could use Allow
to allow access to just those files, and then use Disallow: /
to restrict everything else. That is:
User-agent: *
Allow: /index.html
Allow: /coolstuff.jpg
Allow: /morecoolstuff.html
Disallow: /
The order here is important. Crawlers are supposed to take the first match. So if your first rule was Disallow: /
, a properly behaving crawler wouldn't get to the following Allow
lines.
If a crawler doesn't support Allow
, then it's going to see the Disallow: /
and not crawl anything on your site. Providing, of course, that it ignores things in robots.txt that it doesn't understand.
All the major search engine crawlers support Allow
, and a lot of the smaller ones do, too. It's easy to implement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With