I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe.
I don't want the duplicate content but I do want the pages to be available. Can I set the sitemap to hide one? Would I do this in the robots.txt file?
The disallow looks like this:
Disallow: /wp-admin
How would I customize to the a specific page like:
http://sweatingthebigstuff.com/thank-you-for-commenting
The disallow directive (added within a website's robots. txt file) is used to instruct search engines not to crawl a page on a site. This will normally also prevent a page from appearing within search results.
To prevent specific articles on your site from being indexed by all robots, use the following meta tag: <meta name="robots" content="noindex, nofollow">. To prevent robots from crawling images on a specific article, use the following meta tag: <meta name="robots" content="noimageindex">.
Conducting a scrapy crawl command for a project will first look for the robots. txt file and abide by all the rules. You can ignore robots. txt for your Scrapy spider by using the ROBOTSTXT_OBEY option and set the value to False.
Disallow: /thank-you-for-commenting
in robots.txt
Take a look at last.fm robots.txt file for inspiration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With