Robots.txt, how to allow access only to domain root, and no deeper? [closed]

Tags:

robots.txt

I want to allow crawlers to access my domain's root directory (i.e. the index.html file), but nothing deeper (i.e. no subdirectories). I do not want to have to list and deny every subdirectory individually within the robots.txt file. Currently I have the following, but I think it is blocking everything, including stuff in the domain's root.

User-agent: *
Allow: /$
Disallow: /

How can I write my robots.txt to accomplish what I am trying for?

Thanks in advance!

838

asked Mar 05 '11 20:03

WASa2

1 Answers

There's nothing that will work for all crawlers. There are two options that might be useful to you.

Robots that allow wildcards should support something like:

Disallow: /*/

The major search engine crawlers understand the wildcards, but unfortunately most of the smaller ones don't.

If you have relatively few files in the root and you don't often add new files, you could use Allow to allow access to just those files, and then use Disallow: / to restrict everything else. That is:

User-agent: *
Allow: /index.html
Allow: /coolstuff.jpg
Allow: /morecoolstuff.html
Disallow: /

The order here is important. Crawlers are supposed to take the first match. So if your first rule was Disallow: /, a properly behaving crawler wouldn't get to the following Allow lines.

If a crawler doesn't support Allow, then it's going to see the Disallow: / and not crawl anything on your site. Providing, of course, that it ignores things in robots.txt that it doesn't understand.

All the major search engine crawlers support Allow, and a lot of the smaller ones do, too. It's easy to implement.

149

answered Nov 07 '22 14:11

Jim Mischel

Related questions
                            
                                Excluding testing subdomain from being crawled by search engines (w/ SVN Repository)
                            
                                Need to block subdomain using robots.txt which is on same directory level
                            
                                Unable to map route for robots.txt in asp.net mvc
                            
                                Regexp for robots.txt
                            
                                How to disable robots.txt when you launch scrapy shell?
                            
                                Listing both sitemaps and sitemap index files in robots.txt?
                            
                                Googlebots Ignoring robots.txt? [closed]
                            
                                Any reason to not do a 301 on favicon.ico, apple-touch-icon, and robots.txt?
                            
                                How can i fix "Googlebot can't access your site" issue?
                            
                                Block bingbot from crawling my site
                            
                                Why does Chrome request a robots.txt?
                            
                                Should I use different case-spellings for case-insensitive directories in robots.txt?
                            
                                Robots.txt file in MVC.NET 4
                            
                                Nginx: different robots.txt for alternate domain
                            
                                How to add route to dynamic robots.txt in ASP.NET MVC?
                            
                                Sitemap for a site with a large number of dynamic subdomains
                            
                                How to allow crawlers access to index.php only, using robots.txt?
                            
                                Can I use the “Host” directive in robots.txt?
                            
                                How to add `nofollow, noindex` all pages in robots.txt?
                            
                                How to make a private URL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With