Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp for robots.txt

I am trying to set up my robots.txt, but I am not sure about the regexps.

I've got four different pages all available in three different languages. Instead of listing each page times 3, I figured I could use a regexp.

nav.aspx
page.aspx/changelang (might have a query string attached such as "?toLang=fr".)
mypage.aspx?id and
login.aspx/logoff (=12346?... etc - different each time)

! All four in 3 different languages, e.g:

www.example.com/es/nav.aspx
www.example.com/it/nav.aspx
www.example.com/fr/nav.aspx

Now, my question is: Is the following regexp correct?

User-Agent: *
Disallow: /*nav\.aspx$
Disallow: /*page.aspx/changelang
Disallow: /*mypage\.aspx?id
Disallow: /*login\.aspx\/logoff

Thanks

like image 521
patad Avatar asked Jun 10 '11 13:06

patad


Video Answer


1 Answers

Regular Expressions are not allowed in robots.txt, but Googlebot (and some other robots) can understands some simple pattern matching:

Your robots.txt should look like this:

User-agent: *
Disallow: /*nav.aspx$
Disallow: /*page.aspx/changelang
Disallow: /*mypage.aspx?id
Disallow: /*login.aspx/logoff

User-agent directive is valid with lower case a. You don't have to escape . or `/'.

You can read more about this here: Block or remove pages using a robots.txt file

like image 124
aorcsik Avatar answered Oct 05 '22 11:10

aorcsik