Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use the “Host” directive in robots.txt?

Tags:

seo

robots.txt

Searching for specific information on the robots.txt, I stumbled upon a Yandex help page on this topic. It suggests that I could use the Host directive to tell crawlers my preferred mirror domain:

User-Agent: *
Disallow: /dir/
Host: www.example.com

Also, the Wikipedia article states that Google too understands the Host directive, but there wasn’t much (i.e. none) information.

At robotstxt.org, I didn’t find anything on Host (or Crawl-delay as stated on Wikipedia).

  1. Is it encouraged to use the Host directive at all?
  2. Are there any resources at Google on this robots.txt specific?
  3. How is compatibility with other crawlers?

At least since the beginning of 2021, the linked entry does not deal with the directive in question any longer.

like image 984
dakab Avatar asked Feb 25 '14 10:02

dakab


People also ask

Is robots.txt a directive?

The robots. txt file is one of a number of crawl directives. We have guides on all of them and you'll find them here.

Is robots.txt obsolete?

Google announced back in 2019 that the robots. txt to block indexing would no longer be honored.


1 Answers

The original robots.txt specification says:

Unrecognised headers are ignored.

They call it "headers" but this term is not defined anywhere. But as it’s mentioned in the section about the format, and in the same paragraph as User-agent and Disallow, it seems safe to assume that "headers" means "field names".

So yes, you can use Host or any other field name.

  • Robots.txt parsers that support such fields, well, support them.
  • Robots.txt parsers that don’t support such fields must ignore them.

But keep in mind: As they are not specified by the robots.txt project, you can’t be sure that different parsers support this field in the same way. So you’d have to check every supporting parser manually.

like image 129
unor Avatar answered Oct 06 '22 17:10

unor