Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Robots.txt: Is this wildcard rule valid?

Tags:

seo

robots.txt

Simple question. I want to add:

Disallow */*details-print/

Basically, blocking rules in the form of /foo/bar/dynamic-details-print --- foo and bar in this example can also be totally dynamic.

I thought this would be simple, but then on www.robotstxt.org there is this message:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

So we can't do that? Do search engines abide by it? But then, there's Quora.com's robots.txt file:

Disallow: /ajax/
Disallow: /*/log
Disallow: /*/rss
Disallow: /*_POST

So, who is right -- Or am I misunderstanding the text on robotstxt.org?

Thanks!

like image 723
Bartek Avatar asked Jan 28 '11 21:01

Bartek


People also ask

What is wildcard in robots txt?

While typical formatting in robots. txt will prevent the crawling of the pages in a directory or a specific URL, using wildcards in your robots. txt file will allow you to prevent search engines from accessing content based on patterns in URLs – such as a parameter or the repetition of a character.

What is the meaning of * in robots txt?

Blocking all web crawlers from all content User-agent: * Disallow: / Using this syntax in a robots. txt file would tell all web crawlers not to crawl any pages on www.example.com, including the homepage.

Is robots txt legally binding?

txt be used in a court of law? There is no law stating that /robots. txt must be obeyed, nor does it constitute a binding contract between site owner and user, but having a /robots.

Will robots txt prevent indexing?

It's not possible to use a robots. txt file to prevent Webflow site assets from being indexed because a robots. txt file must live on the same domain as the content it applies to (in this case, where the assets are served).


1 Answers

The answer is, "it depends". The robots.txt "standard" as defined at robotstxt.org is the minimum that bots are expected to support. Googlebot, MSNbot, and Yahoo Slurp support some common extensions, and there's really no telling what other bots support. Some say what they support and others don't.

In general, you can expect the major search engine bots to support the wildcards that you've written, and the one you have there looks like it will work. Best bet would be to run it past one or more of these robots.txt validators or use Google's Webmaster tools to check it.

like image 150
Jim Mischel Avatar answered Sep 18 '22 17:09

Jim Mischel