I have a script which runs 1000 cURL requests using curl_multi_* functions in PHP.
What is the bottleneck behind them timing out?
Would it be the CPU usage? Is there some more efficient way, in terms of how that number of outbound connections is handled by the server, to do this?
I cannot change the functionality and the requests themselves are simple calls to a remote API. I am just wondering what the limit is - would I need to increase memory on the server, or Apache connections, or CPU? (Or something else I have missed)
Your requests are made in a single thread of execution. The bottleneck is almost certainly CPU, have you ever actually watched curl multi code run ? ... it is incredibly cpu hungry; because you don't really have enough control over dealing with the requests. curl_multi makes it possible for you to orchestrate 1000 requests at once, but this doesn't make it a good idea. You have almost no chance of using curl_multi efficiently because you cannot control the flow of execution finely enough, just servicing the sockets and select()'ing on them will account for a lot of the high CPU usage you would see watching your code run on the command line.
The reasons the CPU usage is high during such tasks is this; PHP is designed to run for a fraction of a second, do everything as fast as it can. It usually does not matter how the CPU is utilized, because it's for such a short space of time. When you prolong a task like this the problem becomes more apparent, the overhead incurred with every opcode becomes visible to the programmer.
I'm aware you have said you cannot change the implementation, but still, for a complete answer. Such a task is far more suitable for Threading than curl multi, and you should start reading http://php.net/pthreads, starting with http://php.net/Thread
Left to their own devices on an idle CPU even 1000 threads would consume as much CPU as curl_multi, the point is that you can control precisely the code responsible for downloading every byte of response and upload every byte of the request, and if CPU usage is a concern you can implement a "nice" process by explicitly calling usleep, or limiting connection usage in a meaningful way, additionally your requests can be serviced in separate threads.
I do not suggest that 1000 threads is the thing to do, it is more than likely not. The thing to do would be design a Stackable ( see the documentation ) whose job is to make and service a request in a "nice", efficient way, and design pools ( see examples on github/pecl extension sources ) of workers to execute your newly designed requests ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With