Just because software is automated doesn't mean it will abide by your robots.txt. What are some methods available to detect when someone is crawling or DDOSing your website? Assume your site has 100s of 1000s of pages and is worth crawling or DDOSing.
Here's a dumb idea I had that probably doesn't work: give each user a cookie with a unique value, and use the cookie to know when someone is making second/third/etc requests. This probably doesn't work because crawlers probably don't accept cookies, and thus in this scheme a robot will look like a new user with each request.
Does anyone have better ideas?
You could put links in your pages that are not visible, or clickable by end-users. Many bots just follow all links. Once someone requests one of those links you almost certainly have a crawler/robot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With