Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to recognize bots with php?

I am building stats for my users and dont wish the visits from bots to be counted.

Now I have a basic php with mysql increasing 1 each time the page is called.

But bots are also added to the count.

Does anyone can think of a way?

Mainly is just the major ones that mess things up. Google, Yahoo, Msn, etc.

like image 538
Hugo Gameiro Avatar asked Jan 08 '09 02:01

Hugo Gameiro


People also ask

How can a bot be detected?

How can bot traffic be identified? Web engineers can look directly at network requests to their sites and identify likely bot traffic. An integrated web analytics tool, such as Google Analytics or Heap, can also help to detect bot traffic.

How do I find Googlebot?

Alternatively, you can identify Googlebot by IP address by matching the crawler's IP address to the list of Googlebot IP addresses. For other Google IP addresses from where your site may be accessed (for example, by user request or Apps Scripts), match the accessing IP address against the list of Google IP addresses.

What is known bot detection?

Bot detection mitigates scripted attacks by detecting when a request is likely to be coming from a bot. These types of attacks are sometimes called credential stuffing attacks or list validation attacks. It provides protection against certain attacks that adds very little friction to legitimate users.


2 Answers

You can check the User Agent string, empty strings, or strings containing 'robot', 'spider', 'crawler', 'curl' are likely to be robots.

preg_match('/robot|spider|crawler|curl|^$/i', $_SERVER['HTTP_USER_AGENT']));

like image 87
Rob Avatar answered Sep 25 '22 18:09

Rob


You should filter by user-agent strings. You can find a list of about 300 common user-agents given by bots here: http://www.robotstxt.org/db.html Running through that list and ignoring bot user-agents before you run your SQL statement should solve your problem for all practical purposes.

If you don't want the search engines to even reach the page, use a basic robots.txt file to block them.

like image 25
ine Avatar answered Sep 21 '22 18:09

ine