BOT/Spider Trap Ideas

Question

I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be requesting any images. I can't seem to find any pattern and my suspicion is it's a fleet of Windows Zombies.

The clients had issues in the past with SPAM attacks--even had to point MX at Postini to get the 6.7 GB/day of junk to stop server-side.

I want to setup a BOT trap in a directory disallowed by robots.txt... just never attempted anything like this before, hoping someone out there has a creative ideas for trapping BOTs!

EDIT: I already have plenty of ideas for catching one.. it's what to do to it when lands in the trap.

Alex Howansky · Accepted Answer

You can set up a PHP script whose URL is explicitly forbidden by robots.txt. In that script, you can pull the source IP of the suspected bot hitting you (via $_SERVER['REMOTE_ADDR']), and then add that IP to a database blacklist table.

Then, in your main app, you can check the source IP, do a lookup for that IP in your blacklist table, and if you find it, throw a 403 page instead. (Perhaps with a message like, "We've detected abuse coming from your IP, if you feel this is in error, contact us at ...")

On the upside, you get automatic blacklisting of bad bots. On the downside, it's not terribly efficient, and it can be dangerous. (One person innocently checking that page out of curiosity can result in the ban of a large swath of users.)

Edit: Alternatively (or additionally, I suppose) you can fairly simply add a GeoIP check to your app, and reject hits based on country of origin.

Scott Chamberlain · Answer

What you can do is get another box (a kind of sacrificial lamb) not on the same pipe as your main host then have that host a page which redirects to itself (but with a randomized page name in the url). this could get the bot stuck in a infinite loop tieing up the cpu and bandwith on your sacrificial lamb but not on your main box.

Chris Adragna · Answer

I tend to think this is a problem better solved with network security more so than coding, but I see the logic in your approach/question.

There are a number of questions and discussions about this on server fault which may be worthy of investigating.

https://serverfault.com/search?q=block+bots

BOT/Spider Trap Ideas

Tags:

php

bots

zombie-process

web-crawler

robots.txt

Mikey1980

3 Answers

Alex Howansky

Scott Chamberlain

Chris Adragna

Recent Activity

Donate For Us

BOT/Spider Trap Ideas

Tags:

php

bots

zombie-process

web-crawler

robots.txt

Mikey1980

3 Answers

Alex Howansky

Scott Chamberlain

Chris Adragna

Related questions

Recent Activity

Donate For Us