We have a situation where we log visits and visitors on page hits and bots are clogging up our database. We can't use captcha or other techniques like that because this is before we even ask for human input, basically we are logging page hits and we would like to only log page hits by humans.
Is there a list of known bot IP out there? Does checking known bot user-agents work?
How can bot traffic be identified? Web engineers can look directly at network requests to their sites and identify likely bot traffic. An integrated web analytics tool, such as Google Analytics or Heap, can also help to detect bot traffic.
Bot detection mitigates scripted attacks by detecting when a request is likely to be coming from a bot. These types of attacks are sometimes called credential stuffing attacks or list validation attacks. It provides protection against certain attacks that adds very little friction to legitimate users.
There is no sure-fire way to catch all bots. A bot could act just like a real browser if someone wanted that.
Most serious bots identify themselves clearly in the agent string, so with a list of known bots you can fitler out most of them. To the list you can also add some agent strings that some HTTP libraries use by default, to catch bots from people who don't even know how to change the agent string. If you just log the agent strings of visitors, you should be able to pick out the ones to store in the list.
You can also make a "bad bot trap" by putting a hidden link on your page that leads to a page that's filtered out in your robots.txt file. Serious bots would not follow the link, and humans can't click on it, so only bot that doesn't follow the rules request the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With