Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meta tag vs robots.txt

  1. Is it better to use meta tags* or the robots.txt file for informing spiders/crawlers to include or exclude a page?

  2. Are there any issues in using both the meta tags and the robots.txt?

*Eg: <#META name="robots" content="index, follow">

like image 707
keruilin Avatar asked Jul 27 '10 21:07

keruilin


People also ask

Is robots meta tag necessary?

Some content on your site is not necessary for search engines to index. To prevent indexing of necessary pages, you can use a robots meta tag or x-robots-tag. However, it's not uncommon for robots.

Is robots txt necessary for SEO?

No, a robots. txt file is not required for a website. If a bot comes to your website and it doesn't have one, it will just crawl your website and index pages as it normally would.

Is robots txt obsolete?

No, back in September 2019 Google stopped supporting the unofficial robots. txt noindex directive. While its use should have always been a last resort, the directive is completely useless now.

What is the difference between robots txt and noindex?

So if you want content not to be included in search results, then use NOINDEX. If you want to stop search engines crawling a directory on your server because it contains nothing they need to see, then use “Disallow” directive in your robots. txt file.


2 Answers

There is one significant difference. According to Google they will still index a page behind a robots.txt DENY, if the page is linked to via another site.

However, they will not if they see a metatag:

While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Google Search results completely by using other URL blocking methods, such as password-protecting the files on your server or using the noindex meta tag or response header.

like image 111
user2696762 Avatar answered Sep 20 '22 18:09

user2696762


Both are supported by all crawlers which respect webmasters wishes. Not all do, but against them neither technique is sufficient.

You can use robots.txt rules for general things, like disallow whole sections of your site. If you say Disallow: /family then all links starting with /family are not indexed by a crawler.

Meta tag can be used to disallow a single page. Pages disallowed by meta tags do not affect sub pages in the page hierarchy. If you have meta disallow tag on /work, it does not prevent a crawler from accessing /work/my-publications if there is a link to it on an allowed page.

like image 37
jmz Avatar answered Sep 19 '22 18:09

jmz