Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does mysqli_poll work?

Tags:

php

mysql

mysqli

The documentation on mysqli_poll is a bit sparse.

In the example, they create 3 identical arrays containing all the MySQLi connections to check, but if you read the parameter descriptions, that doesn't look right at all.

It sounds to me like $read is an array holding the connections to check, but $error and $reject should be unpopulated vars that will be filled by the function if there are errors. Is that correct?

What happens when the function returns >= 1? How do you know which connections have data ready to be "reaped"? Is $read modified too? i.e. reduced to the set connections that actually have data?

Lastly, do sec and usec actually do anything? If so, what? I tried setting sec to 0 and usec to 1 (I assume that means 0 seconds + 1 microsecond = 1 microsecond total wait time) but it pauses for more than a second when I run a big query, so it doesn't seem to abort or cause an error when it times out. What does it do?

like image 257
mpen Avatar asked Sep 15 '15 00:09

mpen


1 Answers

TL;DR: Your suppositions are correct, but take care in your coding.


mysqli_poll is a thin convenience wrapper around a socket select, and knowing how socket_select works will get you a long way toward understanding this function.

mysqli_poll is only available when the underlying driver is mysqlnd, because only MySQL ND provides native, socket-level access to the MySQL server. The important point to keep in mind is that the "socket-level" access is what makes polling possible, and why understanding socket select is crucial in understanding the function and limitations of mysqli_poll.

To answer your questions:

It sounds to me like $read is an array holding the connections to check, but $error and $reject should be unpopulated vars that will be filled by the function if there are errors. Is that correct?

Yes, but not a complete picture. Notice the signature of mysqli_poll:

int mysqli_poll ( array &$read , array &$error , array &$reject , int $sec [, int $usec ] )

All three arrays are pass-by-reference, meaning the PHP engine has the ability to modify all three, which it will. In the obvious case, it modifies $error and $reject when any of the requested connections from $read are in an error or connection rejected state.

But PHP will also modify $read when there is data waiting to be read. This is the key to answering your question:

What happens when the function returns >= 1? How do you know which connections have data ready to be "reaped"? Is $read modified too? i.e. reduced to the set connections that actually have data?

Yes, and that is crucial and not obvious in the docs. $read is going to get modified to the list of connections that are ready to be read. You'll loop over them and do your business. But an imperative point is glossed over: if $read is modified, what happens if you put the poll in a loop and try to read from those again? Well, you'll only be reading from a subset, which isn't what you want.

What most examples show when doing a select in PHP is that the source $read array is copied to a new array before being select'd. In the man page for mysqli_poll, notice this loop that "resets" the read array just before the call to the mysqli_poll:

foreach ($all_links as $link) {
    $links[] = $errors[] = $reject[] = $link;
}

This is perhaps the most important point: each of these arrays passed into mysqli_poll will get modified when mysqli_poll finishes: the arrays will be trimmed so that only affected connections are in the result, so you have to reset the arrays each time before you call mysqli_poll.

Another example is seen in this PHP note on socket_select. Notice how $read = $clients; before the select?

To your last question:

Lastly, do sec and usec actually do anything? If so, what? I tried setting sec to 0 and usec to 1 (I assume that means 0 seconds + 1 microsecond = 1 microsecond total wait time) but it pauses for more than a second when I run a big query, so it doesn't seem to abort or cause an error when it times out. What does it do?

Yes, it works. These should represent the upper-bound PHP will wait for data to become available on any of the connections in $read (but read on). It did not work for you because the minimum time is 1 second. When you set 0 for second, even though you had a > 0 microsecond, PHP interpreted that as "wait forever".

As a side note, the unit tests for mysqli_poll might be illuminating.


Update: I wasn't near a computer to test last night. Now that I am, I've got some observations to share.

test 1: long running queries

$ cat mysqli_poll_test
$link  = mysqli_connect(...);
$sql   = 'SELECT SLEEP(2), 1';
mysqli_query($link, $sql, MYSQLI_ASYNC);

$links = array ($link);
$begin = microtime(true);
$i = 0;
do {
    printf("start i=%d @ T+%.f\n", $i, (microtime(true)-$begin));
    $read = $error = $reject = $links;
    mysqli_poll($read, $error, $reject, 1, 500000);
    printf(
        "finish i=%d, count(read, error, reject)=(%d, %d, %d) @ T+%f\n\n",
        $i++, count($read), count($error), count($reject), (microtime(true)-$begin)
    );
} while (count($links) !== count($read) + count($error) + count($reject));

$ php mysqli_poll_test
start i=0 @ T+0.000012
finish i=0, count(read, error, reject)=(0, 0, 0) @ T+1.501807

start i=1 @ T+1.501955
finish i=1, count(read, error, reject)=(1, 0, 0) @ T+2.001353

In this test, the long running query is a simple sleep for 2 seconds at the MySQL server. The mysqli_poll has a timeout of 1.5 seconds. As expected, after 1.5 seconds elapse, the poll returns. Also as expected, there is no data ready to be read, so the do .. while restarts. After the remaining half-second, the poll returns indicating one link is ready to read. This is expected, because the query takes only 2 seconds to resolve and poll sees that very close to exactly two seconds.

If you change the poll timeout to half a second and re-run:

// changed this from 1 to 0 --------V
mysqli_poll($read, $error, $reject, 0, 500000);

The poll kicks out after half a second, and the loop runs four times, as expected. If you change it to 1 microsecond as in your example, it does kick out after 1 microsecond. And if you change it to 0 seconds and 0 microseconds, it runs as fast as it possibly can.

So, I was definitely wrong when I said 0 meant wait forever.

test 2: multiple queries, some error and some long running, with timeouts

Let's change our script to have a few more links and try again:

$link0  = mysqli_connect(...);
$link1  = mysqli_connect(...);
$link2  = mysqli_connect(...);

$sql0   = 'SELECT SLEEP(2) AS wait, 1 AS num';
$sql1   = 'SELECT foo FROM';
$sql2   = 'SELECT 2 AS num';

mysqli_query($link0, $sql0, MYSQLI_ASYNC);
mysqli_query($link1, $sql1, MYSQLI_ASYNC);
mysqli_query($link2, $sql2, MYSQLI_ASYNC);

$links  = array ($link0, $link1, $link2);
$begin = microtime(true);
$i = 0;
do {
    printf("start i=%d @ T+%.f\n", $i, (microtime(true)-$begin));
    $read = $error = $reject = $links;
    $count = mysqli_poll($read, $error, $reject, 1, 500000);
    if (0 < $count) {
        foreach ($links as $j => $link) {
            $result = mysqli_reap_async_query($link);
            if (is_object($result)) {
                printf("link #%d, row=%s\n", $j, json_encode($result->fetch_assoc()));
                mysqli_free_result($result);
            } else if (false !== $result) {
                printf("link #%d, output=%s\n", $j, $link);
            } else {
                printf("link #%d, error=%s\n", $j, mysqli_error($link));
            }
        }
    }
    printf(
        "finish i=%d, count(read, error, reject)=(%d, %d, %d) @ T+%f\n\n",
        $i++, count($read), count($error), count($reject), (microtime(true)-$begin)
    );
} while (count($links) !== count($read) + count($error) + count($reject));

In this test, I'm expecting two results to resolve immediately: one a syntax error and the other a data row. I'm also expecting this to take 1.5 seconds, since the query sleeping 2 seconds will not resolve until after the timeout expires. That doesn't appear to be the case:

start i=0 @ T+0.000002
link #0, row={"wait":"0","num":"1"}
link #1, error=You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
link #2, row={"num":"2"}
finish i=0, count(read, error, reject)=(1, 0, 0) @ T+2.001756

start i=1 @ T+2.001827
finish i=1, count(read, error, reject)=(0, 0, 3) @ T+3.503024

It waits until the SLEEP(2) query resolves, violating the assertion that the timeout was the upper-bound for waiting. The reason this happens is the mysqli_reap_async_query: we're iterating over all links, and each of those is being asked to be reaped. The reaping process waits until the query finishes.

test 3: long running queries with targeted reaping:

Same as test #2, but this time let's be smart about what we're reaping.

$ cat mysqli_poll.php
<?php
$link0  = mysqli_connect(...);
$link1  = mysqli_connect(...);
$link2  = mysqli_connect(...);

$sql0   = 'SELECT SLEEP(2) AS wait, 1 AS num';
$sql1   = 'SELECT foo FROM';
$sql2   = 'SELECT 2 AS num';

mysqli_query($link0, $sql0, MYSQLI_ASYNC);
mysqli_query($link1, $sql1, MYSQLI_ASYNC);
mysqli_query($link2, $sql2, MYSQLI_ASYNC);

$links  = array ($link0, $link1, $link2);
$begin = microtime(true);
$i = 0;
do {
    printf("start i=%d @ T+%.f\n", $i, (microtime(true)-$begin));
    $read = $error = $reject = $links;
    $count = mysqli_poll($read, $error, $reject, 1, 500000);
    printf(
        "check i=%d, count(read, error, reject)=(%d, %d, %d) @ T+%f\n",
        $i, count($read), count($error), count($reject), (microtime(true)-$begin)
    );
    if (0 < $count) {
        reap('read', $read);
        reap('error', $error);
        reap('reject', $reject);
    } else {
        printf("timeout, no results\n");
    }
    printf("finish i=%d\n\n", $i++);
} while (count($links) !== count($read) + count($error) + count($reject));

function reap($label, array $links) {
    foreach ($links as $link) {
        $result = mysqli_reap_async_query($link);
        if (is_object($result)) {
            printf("%s, row=%s\n", $label, json_encode($result->fetch_assoc()));
            mysqli_free_result($result);
        } else if (false !== $result) {
            printf("%s, output=%s\n", $label, $link);
        } else {
            printf("%s, error=%s\n", $label, mysqli_error($link));
        }
    }
}

And now run it.

$ php mysqli_poll.php
start i=0 @ T+0.000003
check i=0, count(read, error, reject)=(1, 0, 0) @ T+0.001007
read, error=You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
finish i=0

start i=1 @ T+0.001256
check i=1, count(read, error, reject)=(1, 0, 1) @ T+0.001327
read, row={"num":"2"}
reject, error=You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
finish i=1

start i=2 @ T+0.001627
check i=2, count(read, error, reject)=(0, 0, 2) @ T+1.503261
timeout, no results
finish i=2

start i=3 @ T+1.503564
check i=3, count(read, error, reject)=(1, 0, 2) @ T+2.001390
read, row={"wait":"0","num":"1"}
reject, error=You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
reject, error=
finish i=3

Much better. Each query resolves in its own good time, with appropriate arrays filled out. The important difference in this example, versus earlier is that we iterate over each of the modified arrays. This runs contrary to some documentation, which shows iterating over all the links themselves.

I've opened documentation bug #70505 on it.

like image 123
bishop Avatar answered Nov 06 '22 05:11

bishop