Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use robots.txt to block certain URL parameters?

Tags:

robots.txt

Before you tell me 'what have you tried', and 'test this yourself', I would like to note that robots.txt updates awfully slow for my siteany site on search engines, so if you could provide theoretical experience, that would be appreciated.

For example, is it possible to allow:

http://www.example.com

And block:

http://www.example.com/?foo=foo

I'm not very sure.

Help?

like image 217
Lucas Avatar asked Jan 02 '13 23:01

Lucas


People also ask

Can the canonical URL be blocked by robots txt?

txt and Canonical Tags. If you disallow a URL, bots can't read the canonical tag in order to follow those instructions. This means that any links that the page has acquired no longer pass SEO equity to the source material.

When should you use a robots txt file?

What is a robots. txt file used for? You can use a robots. txt file for web pages (HTML, PDF, or other non-media formats that Google can read), to manage crawling traffic if you think your server will be overwhelmed by requests from Google's crawler, or to avoid crawling unimportant or similar pages on your site.

What should be disallowed in robots txt?

Disallow directive in robots. txt. You can tell search engines not to access certain files, pages or sections of your website. This is done using the Disallow directive.


1 Answers

According to Wikipedia, "The robots.txt patterns are matched by simple substring comparisons" and as the GET string is a URL you should be able to just add:

Disallow: /?foo=foo

or something more fancy like

Disallow: /*?* 

to disable all get strings. The asterisk is a wildcard symbol so it matches one or many characters of anything.

Example of a robots.txt with dynamic urls.

like image 176
Sean Dawson Avatar answered Oct 06 '22 21:10

Sean Dawson