Is there any way to detect search engines or crawlers on my site. i have seen in phpBB at the admin we can see and allow search engines and also we can see the last visit of the bot(like Google Bot).
any script in PHP? Not Google Analytic or same kind of application. i need to implement that for my blog site, i think there is some way to find out?
You can go by either IP addresses or the 'User-Agent' string that the bot or web browser sends you.
When Googlebot (or most other well-behaving robots) visit your website, they'll send you a $_SERVER['HTTP_USER_AGENT'] variable which identifies what they are. Some examples are:
Googlebot/2.1 (+http://www.google.com/bot.html)
NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/531.4 (KHTML, like Gecko)
You can find many more examples at these websites: link text link text
You could then use PHP to examine those user-agent strings and determine if the user is a search engine or not. I use something like this often:
$searchengines = array(
'Googlebot',
'Slurp',
'search.msn.com',
'nutch',
'simpy',
'bot',
'ASPSeek',
'crawler',
'msnbot',
'Libwww-perl',
'FAST',
'Baidu',
);
$is_se = false;
foreach ($searchengines as $searchengine){
if (!empty($_SERVER['HTTP_USER_AGENT']) and
false !== strpos(strtolower($_SERVER['HTTP_USER_AGENT']), strtolower($searchengine)))
{
$is_se = true;
break;
}
}
if ($is_se) { print('Its a search engine!'); }
Remember that no detection method (Google Analytics or another statistics package or otherwise) is going to be 100% accurate. Some web browsers allow you to set a custom user-agent string, and some misbehaving web crawlers may not send a user-agent string at all. This method can be probably effective for 95%+ of crawlers/visitors though.
You can try to detect them using their user-agent string. A list of them can be found here: http://www.botsvsbrowsers.com/
Search engines tend to use the words crawler and robot.
Search engines are almost the only internet user that visit robots.txt.
There are some IPs known to be bots like the GoogleBot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With