how to detect search engine bots with php?

People also ask

How do you detect search engine bots?

the only official supported way to identify a google bot is to run a reverse DNS lookup on the accessing IP address and run a forward DNS lookup on the result to verify that it points to accessing IP address and the resulting domain name is in either googlebot.com or google.com domain.

How do you know if you have a Googlebot?

Alternatively, you can identify Googlebot by IP address by matching the crawler's IP address to the list of Googlebot IP addresses. For other Google IP addresses from where your site may be accessed (for example, by user request or Apps Scripts), match the accessing IP address against the list of Google IP addresses.

What is bot in search engine?

Search robots, also known as bots, wanderers, spiders, and crawlers, are the tools many web search engines, such as Google , Bing , and Yahoo! , use to build their databases. Most robots work like web browsers, except they don't require user interaction.

I use the following code which seems to be working fine:

function _bot_detected() {

  return (
    isset($_SERVER['HTTP_USER_AGENT'])
    && preg_match('/bot|crawl|slurp|spider|mediapartners/i', $_SERVER['HTTP_USER_AGENT'])
  );
}

update 16-06-2017 https://support.google.com/webmasters/answer/1061943?hl=en

added mediapartners

Here's a Search Engine Directory of Spider names

Then you use $_SERVER['HTTP_USER_AGENT']; to check if the agent is said spider.

if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
    // what to do
}

Check the $_SERVER['HTTP_USER_AGENT'] for some of the strings listed here:

http://www.useragentstring.com/pages/useragentstring.php

Or more specifically for crawlers:

http://www.useragentstring.com/pages/useragentstring.php?typ=Crawler

If you want to -say- log the number of visits of most common search engine crawlers, you could use

$interestingCrawlers = array( 'google', 'yahoo' );
$pattern = '/(' . implode('|', $interestingCrawlers) .')/';
$matches = array();
$numMatches = preg_match($pattern, strtolower($_SERVER['HTTP_USER_AGENT']), $matches, 'i');
if($numMatches > 0) // Found a match
{
  // $matches[1] contains an array of all text matches to either 'google' or 'yahoo'
}

You can checkout if it's a search engine with this function :

<?php
function crawlerDetect($USER_AGENT)
{
$crawlers = array(
'Google' => 'Google',
'MSN' => 'msnbot',
      'Rambler' => 'Rambler',
      'Yahoo' => 'Yahoo',
      'AbachoBOT' => 'AbachoBOT',
      'accoona' => 'Accoona',
      'AcoiRobot' => 'AcoiRobot',
      'ASPSeek' => 'ASPSeek',
      'CrocCrawler' => 'CrocCrawler',
      'Dumbot' => 'Dumbot',
      'FAST-WebCrawler' => 'FAST-WebCrawler',
      'GeonaBot' => 'GeonaBot',
      'Gigabot' => 'Gigabot',
      'Lycos spider' => 'Lycos',
      'MSRBOT' => 'MSRBOT',
      'Altavista robot' => 'Scooter',
      'AltaVista robot' => 'Altavista',
      'ID-Search Bot' => 'IDBot',
      'eStyle Bot' => 'eStyle',
      'Scrubby robot' => 'Scrubby',
      'Facebook' => 'facebookexternalhit',
  );
  // to get crawlers string used in function uncomment it
  // it is better to save it in string than use implode every time
  // global $crawlers
   $crawlers_agents = implode('|',$crawlers);
  if (strpos($crawlers_agents, $USER_AGENT) === false)
      return false;
    else {
    return TRUE;
    }
}
?>

Then you can use it like :

<?php $USER_AGENT = $_SERVER['HTTP_USER_AGENT'];
  if(crawlerDetect($USER_AGENT)) return "no need to lang redirection";?>

Related questions
                            
                                Getting a timestamp for today at midnight?
                            
                                phpunit mock method multiple calls with different arguments
                            
                                deny direct access to a folder and file by htaccess
                            
                                Laravel - Session store not set on request
                            
                                PDO closing connection
                            
                                How do PHP sessions work? (not "how are they used?")
                            
                                Sanitizing strings to make them URL and filename safe?
                            
                                A non well formed numeric value encountered
                            
                                PHP abstract properties
                            
                                How can I compare two dates in PHP?
                            
                                Can't install laravel installer via composer
                            
                                Parsing a string into a boolean value in PHP
                            
                                How to get the current taxonomy term ID (not the slug) in WordPress?
                            
                                How to re-index all subarray elements of a multidimensional array?
                            
                                Robust and Mature HTML Parser for PHP [duplicate]
                            
                                mysqli_fetch_assoc() expects parameter / Call to a member function bind_param() errors. How to get the actual mysql error and fix it?
                            
                                Retrieve Laravel Model results based on multiple ID's
                            
                                How do I deep copy a DateTime object?
                            
                                How to generate .json file with PHP?
                            
                                Laravel Eloquent Sum of relation's column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to detect search engine bots with php?

Tags:

php

bots

web-crawler

People also ask

Recent Activity

Donate For Us