PHP Parallel curl requests

Q: Is PHP cURL asynchronous?

Short answer is no it isn't asynchronous. Longer answer is "Not unless you wrote the backend yourself to do so." If you're using XHR, each request is going to have a different worker thread on the backend which means no request should block any other, barring hitting process and memory limits.

Q: How do you use cURL multi?

To use the multi interface, you must first create a 'multi handle' with curl_multi_init. This handle is then used as input to all further curl_multi_* functions. With a multi handle and the multi interface you can do several simultaneous transfers in parallel. Each single transfer is built up around an easy handle.

Q: What is Curl_multi_exec?

curl_multi_exec(CurlMultiHandle $multi_handle , int &$still_running ): int. Processes each of the handles in the stack. This method can be called whether or not a handle needs to read or write data.

Tags:

php

curl

file-get-contents

I am doing a simple app that reads json data from 15 different URLs. I have a special need that I need to do this serverly. I am using file_get_contents($url).

Since I am using file_get_contents($url). I wrote a simple script, is it:

$websites = array(     $url1,     $url2,     $url3,      ...     $url15 );  foreach ($websites as $website) {     $data[] = file_get_contents($website); }

and it was proven to be very slow, because it waits for the first request and then do the next one.

537

asked Feb 16 '12 09:02

user1205408

2 Answers

If you mean multi-curl then, something like this might help:

  $nodes = array($url1, $url2, $url3); $node_count = count($nodes);  $curl_arr = array(); $master = curl_multi_init();  for($i = 0; $i < $node_count; $i++) {     $url =$nodes[$i];     $curl_arr[$i] = curl_init($url);     curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);     curl_multi_add_handle($master, $curl_arr[$i]); }  do {     curl_multi_exec($master,$running); } while($running > 0);   for($i = 0; $i < $node_count; $i++) {     $results[] = curl_multi_getcontent  ( $curl_arr[$i]  ); } print_r($results);

Hope it helps in some way

answered Sep 26 '22 20:09

Sudhir Bastakoti

i don't particularly like the approach of any of the existing answers

Timo's code: might sleep/select() during CURLM_CALL_MULTI_PERFORM which is wrong, it might also fail to sleep when ($still_running > 0 && $exec != CURLM_CALL_MULTI_PERFORM) which may make the code spin at 100% cpu usage (of 1 core) for no reason

Sudhir's code: will not sleep when $still_running > 0 , and spam-call the async-function curl_multi_exec() until everything has been downloaded, which cause php to use 100% cpu (of 1 cpu core) until everything has been downloaded, in other words it fails to sleep while downloading

here's an approach with neither of those issues:

$websites = array(     "http://google.com",     "http://example.org"     // $url2,     // $url3,     // ...     // $url15 ); $mh = curl_multi_init(); foreach ($websites as $website) {     $worker = curl_init($website);     curl_setopt_array($worker, [         CURLOPT_RETURNTRANSFER => 1     ]);     curl_multi_add_handle($mh, $worker); } for (;;) {     $still_running = null;     do {         $err = curl_multi_exec($mh, $still_running);     } while ($err === CURLM_CALL_MULTI_PERFORM);     if ($err !== CURLM_OK) {         // handle curl multi error?     }     if ($still_running < 1) {         // all downloads completed         break;     }     // some haven't finished downloading, sleep until more data arrives:     curl_multi_select($mh, 1); } $results = []; while (false !== ($info = curl_multi_info_read($mh))) {     if ($info["result"] !== CURLE_OK) {         // handle download error?     }     $results[curl_getinfo($info["handle"], CURLINFO_EFFECTIVE_URL)] = curl_multi_getcontent($info["handle"]);     curl_multi_remove_handle($mh, $info["handle"]);     curl_close($info["handle"]); } curl_multi_close($mh); var_export($results);

note that an issue shared by all 3 approaches here (my answer, and Sudhir's answer, and Timo's answer) is that they will open all connections simultaneously, if you have 1,000,000 websites to fetch, these scripts will try to open 1,000,000 connections simultaneously. if you need to like.. only download 50 websites at a time, or something like that, maybe try:

$websites = array(     "http://google.com",     "http://example.org"     // $url2,     // $url3,     // ...     // $url15 ); var_dump(fetch_urls($websites,50)); function fetch_urls(array $urls, int $max_connections, int $timeout_ms = 10000, bool $return_fault_reason = true): array {     if ($max_connections < 1) {         throw new InvalidArgumentException("max_connections MUST be >=1");     }     foreach ($urls as $key => $foo) {         if (! is_string($foo)) {             throw new \InvalidArgumentException("all urls must be strings!");         }         if (empty($foo)) {             unset($urls[$key]); // ?         }     }     unset($foo);     // DISABLED for benchmarking purposes: $urls = array_unique($urls); // remove duplicates.     $ret = array();     $mh = curl_multi_init();     $workers = array();     $work = function () use (&$ret, &$workers, &$mh, $return_fault_reason) {         // > If an added handle fails very quickly, it may never be counted as a running_handle         while (1) {             do {                 $err = curl_multi_exec($mh, $still_running);             } while ($err === CURLM_CALL_MULTI_PERFORM);             if ($still_running < count($workers)) {                 // some workers finished, fetch their response and close them                 break;             }             $cms = curl_multi_select($mh, 1);             // var_dump('sr: ' . $still_running . " c: " . count($workers)." cms: ".$cms);         }         while (false !== ($info = curl_multi_info_read($mh))) {             // echo "NOT FALSE!";             // var_dump($info);             {                 if ($info['msg'] !== CURLMSG_DONE) {                     continue;                 }                 if ($info['result'] !== CURLE_OK) {                     if ($return_fault_reason) {                         $ret[$workers[(int) $info['handle']]] = print_r(array(                             false,                             $info['result'],                             "curl_exec error " . $info['result'] . ": " . curl_strerror($info['result'])                         ), true);                     }                 } elseif (CURLE_OK !== ($err = curl_errno($info['handle']))) {                     if ($return_fault_reason) {                         $ret[$workers[(int) $info['handle']]] = print_r(array(                             false,                             $err,                             "curl error " . $err . ": " . curl_strerror($err)                         ), true);                     }                 } else {                     $ret[$workers[(int) $info['handle']]] = curl_multi_getcontent($info['handle']);                 }                 curl_multi_remove_handle($mh, $info['handle']);                 assert(isset($workers[(int) $info['handle']]));                 unset($workers[(int) $info['handle']]);                 curl_close($info['handle']);             }         }         // echo "NO MORE INFO!";     };     foreach ($urls as $url) {         while (count($workers) >= $max_connections) {             // echo "TOO MANY WORKERS!\n";             $work();         }         $neww = curl_init($url);         if (! $neww) {             trigger_error("curl_init() failed! probably means that max_connections is too high and you ran out of system resources", E_USER_WARNING);             if ($return_fault_reason) {                 $ret[$url] = array(                     false,                     - 1,                     "curl_init() failed"                 );             }             continue;         }         $workers[(int) $neww] = $url;         curl_setopt_array($neww, array(             CURLOPT_RETURNTRANSFER => 1,             CURLOPT_SSL_VERIFYHOST => 0,             CURLOPT_SSL_VERIFYPEER => 0,             CURLOPT_TIMEOUT_MS => $timeout_ms         ));         curl_multi_add_handle($mh, $neww);         // curl_multi_exec($mh, $unused_here); LIKELY TO BE MUCH SLOWER IF DONE IN THIS LOOP: TOO MANY SYSCALLS     }     while (count($workers) > 0) {         // echo "WAITING FOR WORKERS TO BECOME 0!";         // var_dump(count($workers));         $work();     }     curl_multi_close($mh);     return $ret; }

that will download the entire list and not download more than 50 urls simultaneously (but even that approach stores all the results in-ram, so even that approach may end up running out of ram; if you want to store it in a database instead of in ram, the curl_multi_getcontent part can be modified to store it in a database instead of in a ram-persistent variable.)

answered Sep 26 '22 20:09

hanshenrik

Related questions
                            
                                How do I run PHP code when a user clicks on a link?
                            
                                WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning [closed]
                            
                                Move Laravel 5 Eloquent models into its own directory
                            
                                PHP Server Name from Command Line
                            
                                Round DOWN to nearest half integer in PHP
                            
                                How to format numbers with 00 prefixes in php?
                            
                                Best way to get files from a dir filtered by certain extension in php [duplicate]
                            
                                How to select randomly with doctrine
                            
                                Enable PHP Apache2
                            
                                How to replace different newline styles in PHP the smartest way?
                            
                                PHP check thrown exception type
                            
                                The requested PHP extension dom is missing from your system
                            
                                Redirect to specified URL on PHP script completion?
                            
                                Get the timestamp of exactly one week ago in PHP?
                            
                                usort sorting multiple fields
                            
                                Laravel Eloquent: How to order results of related models?
                            
                                Using php's swiftmailer with gmail
                            
                                Check if a String is ALL CAPS in PHP
                            
                                Simplest way to increment a date in PHP?
                            
                                PHP Date Function Seven days previous

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With