I made a simple web crawler using PHP (and cURL). It parses rougly 60 000 html pages and retreive product information (it's a tool on an intranet).
My main concern is the concurrent connection. I would like to limit the number of connection, so whatever happens, the crawler would never use more than 15 concurrent connections.
The server block the IP whenever the limit of 25 concurrent connections by IP is reached and for some reason, I can't change that on the server side, so I have to find a way to make my script never use more than X concurrent connections.
Is this possible?
Or maybe I should rewrite the whole thing in another language?
Thank you, any help is appreciated!
well you can use curl_set_opt(CURLOPT_MAXCONNECTS, 15);
to limit the number of connections. But you might also want to make a simple connection manager if that doesnt do it for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With