Simple question. I want to add:
Disallow */*details-print/
Basically, blocking rules in the form of /foo/bar/dynamic-details-print
--- foo and bar in this example can also be totally dynamic.
I thought this would be simple, but then on www.robotstxt.org there is this message:
Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".
So we can't do that? Do search engines abide by it? But then, there's Quora.com's robots.txt file:
Disallow: /ajax/
Disallow: /*/log
Disallow: /*/rss
Disallow: /*_POST
So, who is right -- Or am I misunderstanding the text on robotstxt.org?
Thanks!
While typical formatting in robots. txt will prevent the crawling of the pages in a directory or a specific URL, using wildcards in your robots. txt file will allow you to prevent search engines from accessing content based on patterns in URLs – such as a parameter or the repetition of a character.
Blocking all web crawlers from all content User-agent: * Disallow: / Using this syntax in a robots. txt file would tell all web crawlers not to crawl any pages on www.example.com, including the homepage.
txt be used in a court of law? There is no law stating that /robots. txt must be obeyed, nor does it constitute a binding contract between site owner and user, but having a /robots.
It's not possible to use a robots. txt file to prevent Webflow site assets from being indexed because a robots. txt file must live on the same domain as the content it applies to (in this case, where the assets are served).
The answer is, "it depends". The robots.txt "standard" as defined at robotstxt.org is the minimum that bots are expected to support. Googlebot, MSNbot, and Yahoo Slurp support some common extensions, and there's really no telling what other bots support. Some say what they support and others don't.
In general, you can expect the major search engine bots to support the wildcards that you've written, and the one you have there looks like it will work. Best bet would be to run it past one or more of these robots.txt validators or use Google's Webmaster tools to check it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With