Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Robots.txt Allow sub folder but not the parent

Tags:

robots.txt

Can anybody please explain the correct robots.txt command for the following scenario.

I would like to allow access to:

/directory/subdirectory/..

But I would also like to restrict access to /directory/ not withstanding the above exception.

like image 238
QFDev Avatar asked Sep 30 '11 10:09

QFDev


3 Answers

Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt

According to a Google groups post, the following works at least with GoogleBot;

User-agent: Googlebot 
Disallow: /directory/ 
Allow: /directory/subdirectory/
like image 77
user967058 Avatar answered Oct 01 '22 01:10

user967058


I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en

You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.

like image 31
Moojjoo Avatar answered Oct 01 '22 01:10

Moojjoo


If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.

You might try something like this in the code:

if is_parent_directory_path
   <meta name="robots" content="noindex, nofollow">
end
like image 27
Javid Jamae Avatar answered Oct 01 '22 01:10

Javid Jamae