Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to understand array_diff_uassoc optimization

It seems that arrays sorted before comparing each other inside array_diff_uassoc.

What is the benefit of this approach?

Test script

function compare($a, $b)
    {
    echo("$a : $b\n");
    return strcmp($a, $b);
    }

$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('v' => 1, 'w' => 2, 'x' => 3, 'y' => 4, 'z' => 5);
var_dump(array_diff_uassoc($a, $b, 'compare'));


$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('d' => 1, 'e' => 2, 'f' => 3, 'g' => 4, 'h' => 5);
var_dump(array_diff_uassoc($a, $b, 'compare'));


$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
var_dump(array_diff_uassoc($a, $b, 'compare'));

$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('e' => 5, 'd' => 4, 'c' => 3, 'b' => 2, 'a' => 1);
var_dump(array_diff_uassoc($a, $b, 'compare'));

http://3v4l.org/DKgms#v526

P.S. it seems that sorting algorithm changed in php7.

like image 339
sectus Avatar asked Mar 04 '15 04:03

sectus


2 Answers

Sorting algorithm didn't change in PHP 7. Elements are just passed in another order to the sorting algorithm for some performance improvements.

Well, benefit could be an eventual faster execution. You really hit worst case when both arrays have completely other keys.

Worst case complexity is twice sorting the arrays and then comparisons of each key of the two arrays. O(n*m + n * log(n) + m * log(m))

Best case is twice sorting and then just as many comparisons as there are elements in the smaller array. O(min(m, n) + n * log(n) + m * log(m))

In case of a match, you wouldn't have to compare against the full array again, but only from the key after the match on.

But in current implementation, the sorting is just redundant. Implementation in php-src needs some improvement I think. There's no outright bug, but implementation is just bad. If you understand some C: http://lxr.php.net/xref/PHP_TRUNK/ext/standard/array.c#php_array_diff (Note that that function is called via php_array_diff(INTERNAL_FUNCTION_PARAM_PASSTHRU, DIFF_ASSOC, DIFF_COMP_DATA_INTERNAL, DIFF_COMP_KEY_USER); from array_diff_uassoc)

like image 184
bwoebi Avatar answered Nov 05 '22 21:11

bwoebi


Theory

Sorting allows for a few shortcuts to be made; for instance:

A      | B
-------+------
1,2,3  | 4,5,6

Each element of A will only be compared against B[0], because the other elements are known to be at least as big.

Another example:

A      | B
-------+-------
4,5,6  | 1,2,6

In this case, the A[0] is compared against all elements of B, but A[1] and A[2] are compared against B[2] only.

If any element of A is bigger than all elements in B you will get the worst performance.

Practice

While the above works well for the standard array_diff() or array_udiff(), once a key comparison function is used it will resort to O(n * m) performance because of this change while trying to fix this bug.

The aforementioned bug describes how custom key comparison functions can cause unexpected results when used with arrays that have mixed keys (i.e. numeric and string key values). I personally feel that this should've been addressed via the documentation, because you would get equally strange results with ksort().

like image 20
Ja͢ck Avatar answered Nov 05 '22 20:11

Ja͢ck