I am doing a simple app that reads json data from 15 different URLs. I have a special need that I need to do this serverly. I am using file_get_contents($url)
.
Since I am using file_get_contents($url). I wrote a simple script, is it:
$websites = array( $url1, $url2, $url3, ... $url15 ); foreach ($websites as $website) { $data[] = file_get_contents($website); }
and it was proven to be very slow, because it waits for the first request and then do the next one.
The solution to this is to use the xargs command as shown alongside the curl command. The -P flag denotes the number of requests in parallel. The section <(printf '%s\n' {1.. 10}) prints out the numbers 1 – 10 and causes the curl command to run 10 times with 5 requests running in parallel.
Short answer is no it isn't asynchronous. Longer answer is "Not unless you wrote the backend yourself to do so." If you're using XHR, each request is going to have a different worker thread on the backend which means no request should block any other, barring hitting process and memory limits.
To use the multi interface, you must first create a 'multi handle' with curl_multi_init. This handle is then used as input to all further curl_multi_* functions. With a multi handle and the multi interface you can do several simultaneous transfers in parallel. Each single transfer is built up around an easy handle.
curl_multi_exec(CurlMultiHandle $multi_handle , int &$still_running ): int. Processes each of the handles in the stack. This method can be called whether or not a handle needs to read or write data.
If you mean multi-curl then, something like this might help:
$nodes = array($url1, $url2, $url3); $node_count = count($nodes); $curl_arr = array(); $master = curl_multi_init(); for($i = 0; $i < $node_count; $i++) { $url =$nodes[$i]; $curl_arr[$i] = curl_init($url); curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true); curl_multi_add_handle($master, $curl_arr[$i]); } do { curl_multi_exec($master,$running); } while($running > 0); for($i = 0; $i < $node_count; $i++) { $results[] = curl_multi_getcontent ( $curl_arr[$i] ); } print_r($results);
Hope it helps in some way
i don't particularly like the approach of any of the existing answers
Timo's code: might sleep/select() during CURLM_CALL_MULTI_PERFORM which is wrong, it might also fail to sleep when ($still_running > 0 && $exec != CURLM_CALL_MULTI_PERFORM) which may make the code spin at 100% cpu usage (of 1 core) for no reason
Sudhir's code: will not sleep when $still_running > 0 , and spam-call the async-function curl_multi_exec() until everything has been downloaded, which cause php to use 100% cpu (of 1 cpu core) until everything has been downloaded, in other words it fails to sleep while downloading
here's an approach with neither of those issues:
$websites = array( "http://google.com", "http://example.org" // $url2, // $url3, // ... // $url15 ); $mh = curl_multi_init(); foreach ($websites as $website) { $worker = curl_init($website); curl_setopt_array($worker, [ CURLOPT_RETURNTRANSFER => 1 ]); curl_multi_add_handle($mh, $worker); } for (;;) { $still_running = null; do { $err = curl_multi_exec($mh, $still_running); } while ($err === CURLM_CALL_MULTI_PERFORM); if ($err !== CURLM_OK) { // handle curl multi error? } if ($still_running < 1) { // all downloads completed break; } // some haven't finished downloading, sleep until more data arrives: curl_multi_select($mh, 1); } $results = []; while (false !== ($info = curl_multi_info_read($mh))) { if ($info["result"] !== CURLE_OK) { // handle download error? } $results[curl_getinfo($info["handle"], CURLINFO_EFFECTIVE_URL)] = curl_multi_getcontent($info["handle"]); curl_multi_remove_handle($mh, $info["handle"]); curl_close($info["handle"]); } curl_multi_close($mh); var_export($results);
note that an issue shared by all 3 approaches here (my answer, and Sudhir's answer, and Timo's answer) is that they will open all connections simultaneously, if you have 1,000,000 websites to fetch, these scripts will try to open 1,000,000 connections simultaneously. if you need to like.. only download 50 websites at a time, or something like that, maybe try:
$websites = array( "http://google.com", "http://example.org" // $url2, // $url3, // ... // $url15 ); var_dump(fetch_urls($websites,50)); function fetch_urls(array $urls, int $max_connections, int $timeout_ms = 10000, bool $return_fault_reason = true): array { if ($max_connections < 1) { throw new InvalidArgumentException("max_connections MUST be >=1"); } foreach ($urls as $key => $foo) { if (! is_string($foo)) { throw new \InvalidArgumentException("all urls must be strings!"); } if (empty($foo)) { unset($urls[$key]); // ? } } unset($foo); // DISABLED for benchmarking purposes: $urls = array_unique($urls); // remove duplicates. $ret = array(); $mh = curl_multi_init(); $workers = array(); $work = function () use (&$ret, &$workers, &$mh, $return_fault_reason) { // > If an added handle fails very quickly, it may never be counted as a running_handle while (1) { do { $err = curl_multi_exec($mh, $still_running); } while ($err === CURLM_CALL_MULTI_PERFORM); if ($still_running < count($workers)) { // some workers finished, fetch their response and close them break; } $cms = curl_multi_select($mh, 1); // var_dump('sr: ' . $still_running . " c: " . count($workers)." cms: ".$cms); } while (false !== ($info = curl_multi_info_read($mh))) { // echo "NOT FALSE!"; // var_dump($info); { if ($info['msg'] !== CURLMSG_DONE) { continue; } if ($info['result'] !== CURLE_OK) { if ($return_fault_reason) { $ret[$workers[(int) $info['handle']]] = print_r(array( false, $info['result'], "curl_exec error " . $info['result'] . ": " . curl_strerror($info['result']) ), true); } } elseif (CURLE_OK !== ($err = curl_errno($info['handle']))) { if ($return_fault_reason) { $ret[$workers[(int) $info['handle']]] = print_r(array( false, $err, "curl error " . $err . ": " . curl_strerror($err) ), true); } } else { $ret[$workers[(int) $info['handle']]] = curl_multi_getcontent($info['handle']); } curl_multi_remove_handle($mh, $info['handle']); assert(isset($workers[(int) $info['handle']])); unset($workers[(int) $info['handle']]); curl_close($info['handle']); } } // echo "NO MORE INFO!"; }; foreach ($urls as $url) { while (count($workers) >= $max_connections) { // echo "TOO MANY WORKERS!\n"; $work(); } $neww = curl_init($url); if (! $neww) { trigger_error("curl_init() failed! probably means that max_connections is too high and you ran out of system resources", E_USER_WARNING); if ($return_fault_reason) { $ret[$url] = array( false, - 1, "curl_init() failed" ); } continue; } $workers[(int) $neww] = $url; curl_setopt_array($neww, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_SSL_VERIFYHOST => 0, CURLOPT_SSL_VERIFYPEER => 0, CURLOPT_TIMEOUT_MS => $timeout_ms )); curl_multi_add_handle($mh, $neww); // curl_multi_exec($mh, $unused_here); LIKELY TO BE MUCH SLOWER IF DONE IN THIS LOOP: TOO MANY SYSCALLS } while (count($workers) > 0) { // echo "WAITING FOR WORKERS TO BECOME 0!"; // var_dump(count($workers)); $work(); } curl_multi_close($mh); return $ret; }
that will download the entire list and not download more than 50 urls simultaneously (but even that approach stores all the results in-ram, so even that approach may end up running out of ram; if you want to store it in a database instead of in ram, the curl_multi_getcontent part can be modified to store it in a database instead of in a ram-persistent variable.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With