I have a small magento site which consists of page URLs such as:
http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/
However I also have pages which include filters (e.g. price and colour) and two such examples are:
http://www.example.com/products.html?price=1%2C1000
http://www.example.com/products/chairs.html?price=1%2C1000
The issue is that when Google bot and the other search engine bots search the site, it essentially grinds to a halt because they get stuck in all the "filter links".
So, in the robots.txt
file how can it be configured e.g:
User-agent: *
Allow:
Disallow:
To allow all pages like:
http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/
to get indexed but in the case of http://www.example.com/products/chairs.html?price=1%2C1000
index products.html
, but ignore all the content after the ?
?
The same should apply for http://www.example.com/products/chairs.html?price=1%2C1000
I also don't want to have to specify each page, in turn just a rule to ignore everything after the ?
but not the main page itself.
I think this will do it:
User-Agent: *
Disallow: /*?
That will disallow any url that contains a question mark.
If you want to disallow just those that have ?price
, you would write:
Disallow: /*?price
See related questions (list on the right) such as:
Restrict robot access for (specific) query string (parameter) values?
How to disallow search pages from robots.txt
Additional explanation:
The syntax Disallow: /*?
says, "disallow any url that has a question mark in it." The /
is the start of the path-and-query part of the url. So if your url is http://mysite.com/products/chairs.html?manufacturer=128&usage=165
, the path-and-query part is /products/chairs.html?manufacturer=128&usage=165
. The *
says "match any character". So Disallow: /*?
will match /<anything>?<more stuff>
-- anything that has a question mark in it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With