Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get robots.txt to block access to URLs on site after "?" character but index page itself?

I have a small magento site which consists of page URLs such as:

http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/

However I also have pages which include filters (e.g. price and colour) and two such examples are:

http://www.example.com/products.html?price=1%2C1000
http://www.example.com/products/chairs.html?price=1%2C1000

The issue is that when Google bot and the other search engine bots search the site, it essentially grinds to a halt because they get stuck in all the "filter links".

So, in the robots.txt file how can it be configured e.g:

User-agent: *
Allow:
Disallow: 

To allow all pages like:

http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/

to get indexed but in the case of http://www.example.com/products/chairs.html?price=1%2C1000 index products.html, but ignore all the content after the ?? The same should apply for http://www.example.com/products/chairs.html?price=1%2C1000

I also don't want to have to specify each page, in turn just a rule to ignore everything after the ? but not the main page itself.

like image 708
Christine M. Reaves Avatar asked Sep 16 '11 22:09

Christine M. Reaves


1 Answers

I think this will do it:

User-Agent: *
Disallow: /*?

That will disallow any url that contains a question mark.

If you want to disallow just those that have ?price, you would write:

Disallow: /*?price

See related questions (list on the right) such as:

Restrict robot access for (specific) query string (parameter) values?

How to disallow search pages from robots.txt

Additional explanation:

The syntax Disallow: /*? says, "disallow any url that has a question mark in it." The / is the start of the path-and-query part of the url. So if your url is http://mysite.com/products/chairs.html?manufacturer=128&usage=165, the path-and-query part is /products/chairs.html?manufacturer=128&usage=165. The * says "match any character". So Disallow: /*? will match /<anything>?<more stuff> -- anything that has a question mark in it.

like image 72
Jim Mischel Avatar answered Sep 27 '22 17:09

Jim Mischel