I'm making a little bot to crawl a few websites. Now, I'm just testing it out right now and I tried 2 types of settings :
about 10 requests every 3 seconds - the IP got banned, so I said - ok , that's too fast.
2 requests every 3 seconds - the IP got banned after 30 minutes and 1000+ links crawled .
Is that still too fast ? I mean we're talking about close to 1.000.000 links should I get the message that "we just don't want to be crawled ?" or is that still too fast ?
Thanks.
Edit
Tried again - 2 requests every 5 seconds - 30 minutes and 550 links later I got banned .
I'll go with 1 request every 2 seconds but I suspect the same will happen. I guess I'll have to contact an admin - if I can find him.
Here are some guidelines for web crawler politeness.
Typically, if a page takes x amount of seconds to download, it is polite to wait at least 10x-15x before re-downloading.
Also make sure you are honoring robots.txt as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With