I have a very interesting problem that I am failing to explain.
Every 2 to 6 seconds googlebot (I have looked up googlebots IP, its the real thing [using host IP]) is requesting a page on our site (running: php, apache, mongodb) that does not exist (404s). No other robot or human has ever requested a page like this! Just googlebot.
The requests each look something like this:
/2de4f853c2853807b2e72387aa8928a4
/ea5700c343d1a9798bc554af7c1a330e
/e5aafa102d54ba7517703336846cc019
Our code does not use any 32 char strings and there are no links anything like that internal or external of our site. We use codeigniter so at first I thought it was the default session_id, i have checked, it is not.
Has anyone ever seen anything like this? Our website uses history.push on some pages, could this cause it? Just an idea.
Raw Data of an example request:
array (
'date' => '2012-12-01',
'time' => '10:01:33 PM',
'additional_data' =>
array (
'server_vars' =>
array (
'REDIRECT_STATUS' => '200',
'HTTP_HOST' => 'www.xxxxxxx.com',
'HTTP_ACCEPT' => '*/*',
'HTTP_ACCEPT_ENCODING' => 'gzip,deflate',
'HTTP_FROM' => 'googlebot(at)googlebot.com',
'HTTP_USER_AGENT' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
'HTTP_X_FORWARDED_FOR' => 'xxxxxxx',
'HTTP_X_FORWARDED_PORT' => '80',
'HTTP_X_FORWARDED_PROTO' => 'http',
'HTTP_CONNECTION' => 'keep-alive',
'PATH' => '/sbin:/usr/sbin:/bin:/usr/bin:/home/ec2-user/ec2/bin',
'SERVER_SIGNATURE' => '<address>Apache/2.2.22 (Amazon) Server at www.xxxxxxx.com Port 80</address>
',
'SERVER_SOFTWARE' => 'Apache/2.2.22 (Amazon)',
'SERVER_NAME' => 'www.xxxxxxx.com',
'SERVER_ADDR' => 'xxxxxxxxxx',
'SERVER_PORT' => '80',
'REMOTE_ADDR' => '10.171.147.114',
'REMOTE_PORT' => '40759',
'REDIRECT_URL' => '/e5aafa102d54ba7517703336846cc019',
'GATEWAY_INTERFACE' => 'CGI/1.1',
'SERVER_PROTOCOL' => 'HTTP/1.1',
'REQUEST_METHOD' => 'GET',
'QUERY_STRING' => '',
'REQUEST_URI' => '/e5aafa102d54ba7517703336846cc019',
'SCRIPT_NAME' => '/index.php',
'PATH_INFO' => '/e5aafa102d54ba7517703336846cc019',
'PATH_TRANSLATED' => 'redirect:/index.php/e5aafa102d54ba7517703336846cc019',
'PHP_SELF' => '/index.php/e5aafa102d54ba7517703336846cc019',
'REQUEST_TIME' => 1354428093,
),
'codeigiter_session' =>
array (
'session_id' => 'c795e40a279f58d9fbbf7f5501a26787',
'ip_address' => '10.171.147.114',
'user_agent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
'last_activity' => 1354428093,
'user_data' => '',
),
),
)
What else can I collect to figure this out. Its very strange.
Update: The traffic is coming from 2 primary ip addresses. 10.171.147.114 & 10.161.46.102
I have looked these up and they are not GoogleBot.
I have gotten this info from one IP lookup site.
Remember that IP address ranges 10.0.0.0 – 10.255.255.255, 172.16.0.0 – 172.31.255.255, 192.168.0.0 – 192.168.255.255 and 224.0.0.0 - 239.255.255.255 are reserved IP Addresses for private internet use and IP lookup for these will not return any results.
What should / can I do about these requests? What is the point of these requests? If this is a type of DOS attack they are doing a very bad job at it.
To answer this question, the problem was being created by the aws load blancer's health checks. For some reason aws is using the googlebot user_agent to perform them on our servers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With