Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Too aggressive bot?

I'm making a little bot to crawl a few websites. Now, I'm just testing it out right now and I tried 2 types of settings :

  1. about 10 requests every 3 seconds - the IP got banned, so I said - ok , that's too fast.

  2. 2 requests every 3 seconds - the IP got banned after 30 minutes and 1000+ links crawled .

Is that still too fast ? I mean we're talking about close to 1.000.000 links should I get the message that "we just don't want to be crawled ?" or is that still too fast ?

Thanks.

Edit

Tried again - 2 requests every 5 seconds - 30 minutes and 550 links later I got banned .

I'll go with 1 request every 2 seconds but I suspect the same will happen. I guess I'll have to contact an admin - if I can find him.

like image 374
sirrocco Avatar asked Feb 28 '23 19:02

sirrocco


1 Answers

Here are some guidelines for web crawler politeness.

Typically, if a page takes x amount of seconds to download, it is polite to wait at least 10x-15x before re-downloading.

Also make sure you are honoring robots.txt as well.

like image 63
z - Avatar answered Mar 05 '23 16:03

z -