Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I disallow specific page from robots.txt

Tags:

robots.txt

I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe.

I don't want the duplicate content but I do want the pages to be available. Can I set the sitemap to hide one? Would I do this in the robots.txt file?

The disallow looks like this:

Disallow: /wp-admin

How would I customize to the a specific page like:

http://sweatingthebigstuff.com/thank-you-for-commenting

like image 467
Daniel Avatar asked Aug 15 '10 06:08

Daniel


People also ask

What is disallow search in robots txt?

The disallow directive (added within a website's robots. txt file) is used to instruct search engines not to crawl a page on a site. This will normally also prevent a page from appearing within search results.

How do you stop robots from looking at things on a website?

To prevent specific articles on your site from being indexed by all robots, use the following meta tag: <meta name="robots" content="noindex, nofollow">. To prevent robots from crawling images on a specific article, use the following meta tag: <meta name="robots" content="noimageindex">.

How do I ignore robots txt?

Conducting a scrapy crawl command for a project will first look for the robots. txt file and abide by all the rules. You can ignore robots. txt for your Scrapy spider by using the ROBOTSTXT_OBEY option and set the value to False.


1 Answers

Disallow: /thank-you-for-commenting 

in robots.txt

Take a look at last.fm robots.txt file for inspiration.

like image 120
Alex Avatar answered Oct 04 '22 13:10

Alex