Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignore urls in robot.txt with specific parameters?

Tags:

seo

robots.txt

I would like Google to ignore URLs like this:

http://www.mydomain.example/new-printers?dir=asc&order=price&p=3

In other words, all the URLs that have the parameters dir, order and price should be ignored. How do I do so with robots.txt?

like image 614
Luis Valencia Avatar asked Feb 05 '12 13:02

Luis Valencia


People also ask

What does disallow mean in robots txt?

Disallow directive in robots. txt. You can tell search engines not to access certain files, pages or sections of your website. This is done using the Disallow directive. The Disallow directive is followed by the path that should not be accessed.

How do I separate URL parameters?

URL parameters are made of a key and a value, separated by an equal sign (=). Multiple parameters are each then separated by an ampersand (&).

Does Google respect robots txt?

Google officially announced that GoogleBot will no longer obey a Robots. txt directive related to indexing. Publishers relying on the robots. txt noindex directive have until September 1, 2019 to remove it and begin using an alternative.

Can you use regex in robots txt?

Regular Expressions are not valid in robots. txt, but Google, Bing and some other bots do recognise some pattern matching.


3 Answers

Here's a solutions if you want to disallow query strings:

Disallow: /*?*

or if you want to be more precise on your query string:

Disallow: /*?dir=*&order=*&p=*

You can also add to the robots.txt which url to allow

Allow: /new-printer$

The $ will make sure only the /new-printer will be allowed.

More info:

http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/

like image 153
Book Of Zeus Avatar answered Oct 24 '22 15:10

Book Of Zeus


You can block those specific query string parameters with the following lines

Disallow: /*?*dir=
Disallow: /*?*order=
Disallow: /*?*p=

So if any URL contains dir=, order=, or p= anywhere in the query string, it will be blocked.

like image 40
Nick Rolando Avatar answered Oct 24 '22 17:10

Nick Rolando


Register your website with Google WebMaster Tools. There you can tell Google how to deal with your parameters.

Site Configuration -> URL Parameters

You should have the pages that contain those parameters indicate that they should be excluded from indexing via the robots meta tag. e.g.

like image 1
Tony McCreath Avatar answered Oct 24 '22 16:10

Tony McCreath