Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google robots.txt for http site after redirection to https

The Google Robots.txt Specification states that a robots txt URL http://example.com/robots.txt is not valid for domain https://example.com. Presumably the reverse it also true.

It also has this to say about following redirects when requesting a robots.txt:

3xx (redirection)

Redirects will generally be followed until a valid result can be found (or a loop is recognized). We will follow a limited number of redirect hops (RFC 1945 for HTTP/1.0 allows up to 5 hops) and then stop and treat it as a 404. Handling of robots.txt redirects to disallowed URLs is undefined and discouraged. Handling of logical redirects for the robots.txt file based on HTML content that returns 2xx (frames, JavaScript, or meta refresh-type redirects) is undefined and discouraged.

Say I set up a website so that all requests on http are redirected permanently to equivalent on https. Google will request http://example.com/robots.txt and follow the redirect to https://example.com/robots.txt. Will this file be the valid robots.txt for the http site, because that was the original request, or will Google think there is no valid robots.txt for the http site?

like image 666
GC. Avatar asked Nov 17 '25 22:11

GC.


1 Answers

Using the robots.txt tester in the Google Search Console confirmed that the redirected robots.txt is used as the robots file for the http (original) domain.

Answer provided by Barry Hunter on the webmaster central forum: https://productforums.google.com/forum/#!topic/webmasters/LLDVaso5QP8

like image 115
GC. Avatar answered Nov 19 '25 14:11

GC.