array_udiff
computes the difference between two arrays using a callback function. However, it requires a compare function instead of a predicate function.
A compare function compares item A relative to item B. A predicate function would just determine whether or not item A is equal to item B.
Compare functions are usually required by sort functions to determine the correct ordering. Since array_udiff
is just computing the differences, a predicate function that determines whether each pair is equal seems like it should suffice.
Why does array_udiff use a compare function instead of a predicate function? Does it matter if I use a predicate instead? i.e. Can I elect to just use the 0
and 1
return values to denote inequality and equality, discarding the -1
possibility? What adverse effect, if any, would this have on my results?
The implementation for php_array_diff() (which supplies implementation for a number of userspace array functions) works by reusing a number of internal comparison functions.
This is because those comparison functions already exist for other purposes, and meet the needed task at hand: determine if two items are equal or not. That they do a little extra work is unimportant; what's important is the relative reduction in code that needs to be considered. (An equals function could easily be written in terms of a comparison function, or as a separate entity, but now you've got two functions to do the same job.)
The actual implementation also works by sorting. So you need to use a comparison algorithm suitable for sorting, or you will get unexpected results. For example:
$a = [0, 1, 2, 3, 4, 5, 6];
$b = [4];
print_r(array_udiff($a, $b, function($x, $y) {
return $x <=> $y; //Sorting comparison function, correct
}));
print_r(array_udiff($a, $b, function($x, $y) {
return $x != $y; // Equality test, incorrect
}));
gives
Array //Sorting comparison function, correct
(
[0] => 0
[1] => 1
[2] => 2
[3] => 3
[5] => 5
[6] => 6
)
Array // Equality test, incorrect
(
[0] => 0
[1] => 1
[2] => 2
[3] => 3
[4] => 4 // equality test causes udiff to incorrectly include 4
[5] => 5
[6] => 6
)
The reason for this is the algorithm php_array_diff() uses. Basically it goes like this:
So, the algorithm relies on all of the inputs being sorted, and uses that fact, (and the the comparison function) so it only has to inspect each element in each input array once. If the comparison function doesn't result in an actually sorted array, the algorithm fails and you get a bad result.
HHVM may result in a different result, as HHVM uses a different sorting algorithm. HHVM uses a pure quicksort, while PHP uses a quicksort implementation derived from llvm that includes an insertion sort optimization.
Normally, different sorting algorithms arrive at the same solution via different means. That is, different algorithms cause elements to be compared in a different order, at different times, and in different quantities. In the case of an incorrect comparison function, this can have a large effect on the final order of the array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With