Is it better to use meta tags* or the robots.txt file for informing spiders/crawlers to include or exclude a page?
Are there any issues in using both the meta tags and the robots.txt?
*Eg: <#META name="robots" content="index, follow">
Some content on your site is not necessary for search engines to index. To prevent indexing of necessary pages, you can use a robots meta tag or x-robots-tag. However, it's not uncommon for robots.
No, a robots. txt file is not required for a website. If a bot comes to your website and it doesn't have one, it will just crawl your website and index pages as it normally would.
No, back in September 2019 Google stopped supporting the unofficial robots. txt noindex directive. While its use should have always been a last resort, the directive is completely useless now.
So if you want content not to be included in search results, then use NOINDEX. If you want to stop search engines crawling a directory on your server because it contains nothing they need to see, then use “Disallow” directive in your robots. txt file.
There is one significant difference. According to Google they will still index a page behind a robots.txt DENY, if the page is linked to via another site.
However, they will not if they see a metatag:
While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Google Search results completely by using other URL blocking methods, such as password-protecting the files on your server or using the noindex meta tag or response header.
Both are supported by all crawlers which respect webmasters wishes. Not all do, but against them neither technique is sufficient.
You can use robots.txt rules for general things, like disallow whole sections of your site. If you say Disallow: /family
then all links starting with /family
are not indexed by a crawler.
Meta tag can be used to disallow a single page. Pages disallowed by meta tags do not affect sub pages in the page hierarchy. If you have meta disallow tag on /work
, it does not prevent a crawler from accessing /work/my-publications
if there is a link to it on an allowed page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With