Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running file_put_contents in parallel?

was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)

Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.

PHP code:

$urls = array(
    'http://stackoverflow.com/a-really-nice-file.xml',
    'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
    set_time_limit(0);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
    $contents = curl_exec($ch);
    curl_close($ch);
    file_put_contents($filename, $contents);
}

Now, $filename is set somewhere else and gives each xml it's own ID based on my logic. So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.

Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?

like image 492
David K. Avatar asked Oct 05 '12 22:10

David K.


2 Answers

Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.

I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.

Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.

like image 70
aknosis Avatar answered Nov 03 '22 03:11

aknosis


Just run more script. Each script will download some urls.

You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern

The more script your run the more parallelism you get

like image 29
dynamic Avatar answered Nov 03 '22 03:11

dynamic