Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good source of Crawler / Spider IP addresses

Tags:

ip

web-crawler

Where could I find comprehensive list of Crawler or Spider IP address. I need IPs from google yahoo microsoft and other search engines that regularly crawl my sites.

I do not want to disable them so keep robots.txt file out of the answers. The list is for filter that is doing statistical reporting on activity on each page.

Please post links to good sources that could be used. Paid or free.

like image 348
MatBanik Avatar asked Jan 22 '11 22:01

MatBanik


People also ask

How do I find Googlebot?

Alternatively, you can identify Googlebot by IP address by matching the crawler's IP address to the list of Googlebot IP addresses. For other Google IP addresses from where your site may be accessed (for example, by user request or Apps Scripts), match the accessing IP address against the list of Google IP addresses.

What is Google's spider called?

"Crawler" (sometimes also called a "robot" or "spider") is a generic term for any program that is used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot.


2 Answers

Your web server logs. I believe they're free.

like image 108
Jon Avatar answered Sep 25 '22 16:09

Jon


You probably don't want to do this by IP address. Most crawlers send a unique user agent string when they crawl your site and it's much more likely you want to use that to identify them. I don't know where you can find a good list of those though

EDIT: Actually this page I found with google seems to both answer your question a bit, and also give the user agents (which is still more likely a better approach)

like image 40
jcoder Avatar answered Sep 24 '22 16:09

jcoder