Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Precedence of X-Robots-Tag header vs robots meta tag

I've placed the following Header in my vhost config:

Header set X-Robots-Tag "noindex, nofollow"

The goal here is to just disable search engines from indexing my testing environment. The site is Wordpress and there is a plugin installed to manage per-page the meta robots settings. For example:

<meta name="robots" content="index, follow" />

So my question is, which directive will take precedence over the other since both are being set on every page?

like image 206
Jared Eitnier Avatar asked Jul 29 '13 17:07

Jared Eitnier


People also ask

What is X-robots-tag header?

txt file and meta robots tag, though, in that the X-Robots-Tag is a part of the HTTP header that controls indexing of a page on the whole, in addition to specific elements on a page. According to Google: “Any directive that can be used in a robots meta tag can also be specified as an X-Robots-Tag.”

What is the difference between meta robots and robots txt?

Robots. txt files are best for disallowing a whole section of a site, such as a category whereas a meta tag is more efficient at disallowing single files and pages. You could choose to use both a meta robots tag and a robots.

What is the value to be set for X-robots-tag header to prevent the crawlers from indexing a page?

HTTP response header Instead of a meta tag, you can also return an X-Robots-Tag header with a value of either noindex or none in your response.

Is robots meta tag necessary?

Meta Robots Directives on a Page Blocked By Robots. If a page has never been indexed, a robots. txt disallow rule should be sufficient to prevent this from showing in search results, but it is still recommended that a meta robots tag is added.


3 Answers

I am not sure if a definitive answer can be given to the question, as the behavior may be implementation-dependent (on the robot side).

However, I think there is reasonable evidence that X-Robots-Tag will take precedence over <meta name="robots" .... See :

One significant difference between the X-Robots-Tag and the robots meta directive is:

  • X-Robots-Tag is part of the HTTP protocol header.
  • <meta name="robots" ... is part of the HTML document header.

Therefore the the X-Robots-Tag belongs to HTTP protocol layer, while <meta name="robots" ... belongs to the HTML protocol layer.

Protocol capture

As they belong to a different protocol layer, they will not be parsed simultaneously by the (robot) client getting the page: The HTTP layer will be parsed first, and the HTML in a later step.

(Also, it should be noted that X-Robots-Tag and <meta name="robots" ... are not suppported by all robots. Google and Yahoo/Bing suppport both, but according to this some support only <meta name="robots" ..., others support neither.)

Summary :

  • if supported by the robot, X-Robots-Tag will be processed first ; restrictions (noindex, nofollow) apply (and <meta name="robots" ... is ignored).
  • else, <meta name="robots" ... directive applies.
like image 75
d-stroyer Avatar answered Sep 24 '22 08:09

d-stroyer


Just an update to Dan's experience, I also have both the

Header set X-Robots-Tag "noindex, nofollow"

and

<meta name="robots" content="index, follow" />

on my one of my Wordpress sites, and a check in Google Search Console confirmed that the noindex in X-Robots-Tag is taking precedence as the pages have been crawled and but aren't indexed. So the logic in the correct answer is indeed, correct.

like image 27
Marie Avatar answered Sep 20 '22 08:09

Marie


In my recent experience, when Google sees mixed-messages it prefers positive action by default - ie - it favours indexing - meanwhile will flag the issue as a critical error/warning in your webmaster tools console if you have one.

see your site's status in google here: https://www.google.com/webmasters/

see you site's status in bing here: http://www.bing.com/toolbox/webmaster (note that yahoo search is now powered by bing)

Google takes this positive-by-default action because lots of site owners unwittingly have a dodgy cms semi-blocking robots and we know how google loves to accumulate as much data as it can - any excuse!

if the technical settings are erroneous they're liable to be totally disregarded, and we know how search engines index and follow by default when no settings are specified.

like image 27
Dan Avatar answered Sep 24 '22 08:09

Dan