Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistencies with CURL Multi PHP

Tags:

php

curl

When I run a check on 10 urls, if I am able to get a connection with the host server, the handle will return a success message (CURLE_OK)

When processing each handle if a server refuses the connection, the handle will include a error message.

The problem

I assumed that when we get a bad handle, CURL will mark this handle but continue to process the unprocessed handles, however this is not what seems to happen. When we come across a bad handle, CURL will mark this handle as bad, but will not process the remaining unprocessed handles.

This can be hard to detect, if I do get a connection with all handles, which is what happens most of the time, then the problem is not visible.(CURL only stops on first bad connection);

For the test, I had to find a suitable site which loads slow/refuses x amount simultaneous of connections.

set_time_limit(0);

$l = array(
    'http://smotri.com/video/list/',
    'http://smotri.com/video/list/sports/',
    'http://smotri.com/video/list/animals/',
    'http://smotri.com/video/list/travel/',
    'http://smotri.com/video/list/hobby/',
    'http://smotri.com/video/list/gaming/',
    'http://smotri.com/video/list/mult/',
    'http://smotri.com/video/list/erotic/',
    'http://smotri.com/video/list/auto/',
    'http://smotri.com/video/list/humour/',
    'http://smotri.com/video/list/film/'
);


$mh = curl_multi_init();

$s = 0;
$f = 10;

while($s <= $f)
{   

    $ch = curl_init();  

    $curlsettings = array(
        CURLOPT_URL => $l[$s],
        CURLOPT_TIMEOUT => 0,
        CURLOPT_CONNECTTIMEOUT => 0,
        CURLOPT_RETURNTRANSFER => 1
    );

    curl_setopt_array($ch, $curlsettings);
    curl_multi_add_handle($mh,$ch);

    $s++;

    }

$active = null;

do 
{
    curl_multi_exec($mh,$active);
    curl_multi_select($mh);

    $info = curl_multi_info_read($mh);

    echo '<pre>';
    var_dump($info);

    if($info['result'] === CURLE_OK)
        echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';

    if($info['result'] != 0)
        echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';

} while ($active > 0);

curl_multi_close($mh);

I have dumped $info in the script which asks the Multi Handle if there is any new information on any handles whilst running. When the script has ended we will see some bool(false) - when no new information was available(handles were still processing), along with all handles if all was successful or limited handles if one handle failed.

I have failed at fixing this, its probably something I have overlooked and I have gone too far down the road on attempting to fix things which are not relevant.

Some attempts at fixing this was.

  • Assign each $ch handle to a array - $ch[1], $ch[2] etc (instead of adding current $ch handle to multi_handle then overwriting - as whats in the test)

  • Removing handles after success/failure with curl_​multi_​remove_​handle

  • Set CURLOPT_CONNECTTIMEOUT and CURLOPT_TIMEOUT to infinity.

    • many more.(I will update this post as I have forgotten all of them)

Testing this with Php version 5.4.14 Hopefully I have illustrated the points well enough.

Thanks for reading.

like image 684
cecilli0n Avatar asked Mar 20 '14 16:03

cecilli0n


1 Answers

I've been mucking around with your script for a while now trying to get it to work.
It was only when I read Repeated calls to this function will return a new result each time, until a FALSE is returned as a signal that there is no more to get at this point., for http://se2.php.net/manual/en/function.curl-multi-info-read.php, that I realized a while loop might work.

The extra while loop makes it behave exactly how you'd expect. Here is the output I get:

http://smotri.com/video/list/sports/ failed

http://smotri.com/video/list/travel/ failed

http://smotri.com/video/list/gaming/ failed

http://smotri.com/video/list/erotic/ failed

http://smotri.com/video/list/humour/ failed

http://smotri.com/video/list/animals/ success

http://smotri.com/video/list/film/ success

http://smotri.com/video/list/auto/ success

http://smotri.com/video/list/ failed

http://smotri.com/video/list/hobby/ failed

http://smotri.com/video/list/mult/ failed


Here's the code I used for testing:

<?php
set_time_limit(0);

$l = array(
    'http://smotri.com/video/list/',
    'http://smotri.com/video/list/sports/',
    'http://smotri.com/video/list/animals/',
    'http://smotri.com/video/list/travel/',
    'http://smotri.com/video/list/hobby/',
    'http://smotri.com/video/list/gaming/',
    'http://smotri.com/video/list/mult/',
    'http://smotri.com/video/list/erotic/',
    'http://smotri.com/video/list/auto/',
    'http://smotri.com/video/list/humour/',
    'http://smotri.com/video/list/film/'
);

$mh = curl_multi_init();

$s = 0;
$f = 10;

while($s <= $f)
{   
    $ch = curl_init();  

    if($s%2)
    {
        $curlsettings = array(
            CURLOPT_URL => $l[$s],
            CURLOPT_TIMEOUT_MS => 3000,
            CURLOPT_RETURNTRANSFER => 1,
        );
    }
    else
    {
        $curlsettings = array(
            CURLOPT_URL => $l[$s],
            CURLOPT_TIMEOUT_MS => 4000,
            CURLOPT_RETURNTRANSFER => 1,
        );
    }

    curl_setopt_array($ch, $curlsettings);
    curl_multi_add_handle($mh,$ch);
    $s++;
}

$active = null;

do 
{

    $mrc = curl_multi_exec($mh,$active);
    curl_multi_select($mh);

    while($info = curl_multi_info_read($mh))
    {
        echo '<pre>';
        //var_dump($info);

        if($info['result'] === 0)
        {
            echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
        }
        else
        {
            echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
        }   
    }

} while ($active > 0);

curl_multi_close($mh);


Hope that helps. For testing just adjust CURLOPT_TIMEOUT_MS to your internet connection. I made it so it alternates between 3000 and 4000 milliseconds as 3000 will fail and 4000 usually succeeds.

Update

After going through the PHP and libCurl docs I have found how curl_multi_exec works (in libCurl its curl_multi_perform). Upon first being called it starts handling transfers for all the added handles (added before via curl_multi_add_handle).

The number it assigns $active is the number of transfers still running. So if it's less than the total number of handles you have then you know one or more transfers are complete. So curl_multi_exec acts as a kind of progress indicator as well.

As all transfers are handled in a non-blocking fashion (transfers can finish simultaneously) the while loop curl_multi_exec's in cannot represent each iteration of completed url requests.

All data is stored in a queue so as soon as one or more transfers are complete you can call curl_multi_info_read to fetch this data.

In my original answer I had curl_multi_info_read in a while loop. This loop would keep iterating until curl_multi_info_read found no remaining data in the queue. After which the outer while loop would move onto the next iteration if $active != 0 (meaning curl_multi_exec reported transfers still not complete).

To summarize, the outer loop keeps iterating when there are still transfers not completed and the inner loop iterates only when there's data from a completed transfer.

The PHP documentation is pretty bad for curl multi functions so I hope this cleared a few things up. Below is an alternative way to do the same thing.

do 
{
    curl_multi_exec($mh,$active);
} while ($active > 0);

// while($info = curl_multi_info_read($mh)) would work also here
for($i = 0; $i <= $f; $i++){
    $info = curl_multi_info_read($mh);

    if($info['result'] === 0)
    {
        echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
    }
    else
    {
        echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
    }
}


From this information you can also see curl_multi_select is not needed as you don't want something that blocks until there is activity.

With the code you provided in your question it only seemed like curl wasn't proceeding after a few failed transfers but there was actually still data queued in the buffer. Your code just wasn't calling curl_multi_info_read enough times. The reason all the successful transfers were picked up by your code is due to PHP being run on a single thread and so the script hanged waiting for the requests. The timeouts for the failed requests didn't impact PHP enough to make it hang/wait that long so the number of iterations the while loop was doing was less than the number of queued data.

like image 116
James T Avatar answered Sep 19 '22 23:09

James T