Wondering if following will work for google in robots.txt
Disallow: /*.action
I need to exclude all urls ending with .action.
Is this correct?
Crawlers and bots have specific names with which they can be recognized on a server. A note in the robots. txt file can lay out which crawlers much follow which commands. An asterisk (*) denotes a rule for all bots. Google uses various user agents to crawl the internet, the most important of which is the “Googlebot.”
Luckily, there's a simple fix for this error. All you have to do is update your robots. txt file (example.com/robots.txt) and allow Googlebot (and others) to crawl your pages. You can test these changes using the Robots.
While typical formatting in robots. txt will prevent the crawling of the pages in a directory or a specific URL, using wildcards in your robots. txt file will allow you to prevent search engines from accessing content based on patterns in URLs – such as a parameter or the repetition of a character.
Disallow directive in robots. txt. You can tell search engines not to access certain files, pages or sections of your website. This is done using the Disallow directive. The Disallow directive is followed by the path that should not be accessed.
To block files of a specific file type (for example, .gif), use the following:
User-agent: Googlebot
Disallow: /*.gif$
So, you are close. Use Disallow: /*.action$ with a trailing "$"
Of course, that's merely what Google suggests: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
All bots are different.
The robots.txt specification provides no way to include wildcards, only the beginning of URIs.
Google implement non-standard extensions, described in their documentation (look in the Manually create a robots.txt file section under "To block files of a specific file type").
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With