Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Googlebot and empty CORS responses

We have a React app that loads some data asynchronously from another domain. The requests are made using isomorphic-fetch in cors mode and the requests and responses all look fine and work correctly when testing using my own browser.

We have monitoring of the responses and log failures back to our application for analysis.

While most of the time all is well (and everything seems to be getting indexed correctly and showing up fine in Google) we still see a lot of failures, only for Googlebot, where it's failing to fetch the data correctly. Debugging the response object I see that the status is 200, but the statusText is empty. The response has no body (and so no .json or .text methods), and no headers (which shouldn't be the case) and the mode is correctly set as cors (not opaque, which might explain some of the other oddities).

From my understanding of CORS this all looks above board in terms of the headers being sent and received, so why is Googlebot having so many intermittent problems? Googlebot is saying that it has an HTTP 200 response (successful, the Promise is not rejected), but it's missing all the things that come with an HTTP 200 responose - it has no body and no headers exposed. Why is Googlebot failing to return a response with headers and a body (as described below)?

A normal preflight request looks like this (from Chome devtools) (extra slash in */\* added to stop SO thinking that it's a comment opener)

Accept:*/\*
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Access-Control-Request-Headers:content-type, x-apikey
Access-Control-Request-Method:POST
Cache-Control:no-cache
Connection:keep-alive
DNT:1
Host:my.host.net
Origin:http://my.origin.net
Pragma:no-cache
Referer:http://my.origin.net/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100 Safari/537.36

And the preflight response looks like this

Access-Control-Allow-Headers:content-type,x-apikey
Access-Control-Allow-Origin:*
Cache-Control:no-cache
Connection:keep-alive
Content-Length:0
Date:Mon, 05 Dec 2016 00:55:05 GMT
Expires:-1
Pragma:no-cache
Server:Microsoft-IIS/8.5
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET

Which is then followed up by the actual request which looks like this (sent as a POST with a JSON body)

accept:application/json
Accept-Encoding:gzip, deflate, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:no-cache
Connection:keep-alive
Content-Length:62
content-type:application/json
DNT:1
Host:someapi.net
Origin:http://my.origin.net
Pragma:no-cache
Referer:http://my.origin.net/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like  Gecko) Chrome/54.0.2840.100 Safari/537.36
x-apikey:someapikey

Which returns a response like this (with a JSON body)

Access-Control-Allow-Origin:*
Cache-Control:no-cache
Connection:keep-alive
Content-Length:33576
Content-Type:application/json; charset=utf-8
Date:Mon, 05 Dec 2016 00:55:05 GMT
Expires:-1
Pragma:no-cache
Server:Microsoft-IIS/8.5
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET
like image 309
El Yobo Avatar asked Dec 05 '16 01:12

El Yobo


1 Answers

Check the IP address of the failing GoogleBot calls

It may be a nefarious actor, pretending to be google

Check the IP addresses as described here:

https://support.google.com/webmasters/answer/80553?hl=en

like image 75
stujo Avatar answered Nov 07 '22 23:11

stujo