Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure robots.txt file to block all but 2 directories

I don't want any search search engines to index most of my website.

I do however want search engines to index 2 folders ( and their children ). This is what I set up, but I don't think it works, I see pages in Google that I wanted to hide:

Here's my robots.txt

User-agent: *
Allow: /archive/
Allow: /lsic/
User-agent: *
Disallow: /

What's the correct way to disallow all folders, except for 2 ?

like image 227
jeph perro Avatar asked Jun 23 '11 21:06

jeph perro


1 Answers

I gave a tutorial about this on this forum here. And in Wikipedia here

Basically the first matching robots.txt pattern always wins:

User-agent: *
Allow: /archive/
Allow: /lsic/
Disallow: /

But I suspect it might be too late. Once the page is indexed it's pretty hard to remove it. The only way is to shift it to another folder or just password protect the folder. You should be able to do that in your hosts CPanel.

like image 195
T9b Avatar answered Oct 22 '22 14:10

T9b