How does array_diff()
work? It obviously couldn't work as follows:
function array_diff($arraya, $arrayb)
{
$diffs = array();
foreach ($arraya as $keya => $valuea)
{
$equaltag = 0;
foreach ($arrayb as $valueb)
{
if ($valuea == $valueb)
{
$equaltag =1;
break;
}
}
if ($equaltag == o)
{
$diffs[$keya]=$valuea;
}
}
return $diffs;
} //couldn't be worse than this
Does anyone know a better solution?
EDIT @animuson:
function array_diff($arraya, $arrayb)
{
foreach ($arraya as $keya => $valuea)
{
if (in_array($valuea, $arrayb))
{
unset($arraya[$keya]);
}
}
return $arraya;
}
The array_diff() function compares the values of two (or more) arrays, and returns the differences. This function compares the values of two (or more) arrays, and return an array that contains the entries from array1 that are not present in array2 or array3, etc.
Now, to check whether two arrays are equal or not, an iteration can be done over the arrays and check whether for each index the value associated with the index in both the arrays is the same or not. PHP has an inbuilt array operator( === ) to check the same but here the order of array elements is not important.
The array_diff() (manual) function can be used to find the difference between two arrays: $array1 = array(10, 20, 40, 80); $array2 = array(10, 20, 100, 200); $diff = array_diff($array1, $array2); // $diff = array(40, 80, 100, 200);
To display array structure and values in PHP, we can use two functions. We can use var_dump() or print_r() to display the values of an array in human-readable format or to see the output value of the program array.
user187291's suggestion to do it in PHP via hash tables is simply great! In a rush of adrenaline taken from this phantastic idea, I even found a way to speed it up a little more (PHP 5.3.1):
function leo_array_diff($a, $b) {
$map = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) unset($map[$val]);
return array_keys($map);
}
With the benchmark taken from user187291's posting:
LEO=0.0322 leo_array_diff()
ME =0.1308 my_array_diff()
YOU=4.5051 your_array_diff()
PHP=45.7114 array_diff()
The array_diff() performance lag is evident even at 100 entries per array.
Note: This solution implies that the elements in the first array are unique (or they will become unique). This is typical for a hash solution.
Note: The solution does not preserve indices. Assign the original index to $map and finally use array_flip() to preserve keys.
function array_diff_pk($a, $b) {
$map = array_flip($a);
foreach($b as $val) unset($map[$val]);
return array_flip($map);
}
PS: I found this while looking for some array_diff() paradoxon: array_diff() took three times longer for practically the same task if used twice in the script.
UPDATE
see below for faster/better code.
array_diff behaviour is much better in php 5.3.4, but still ~10 times slower than Leo's function.
also it's worth noting that these functions are not strictly equivalent to array_diff
since they don't maintain array keys, i.e. my_array_diff(x,y) == array_values(array_diff(x,y))
.
/UPDATE
A better solution is to use hash maps
function my_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) if(isset($map[$val])) $map[$val] = 0;
foreach($map as $val => $ok) if($ok) $out[] = $val;
return $out;
}
$a = array('A', 'B', 'C', 'D');
$b = array('X', 'C', 'A', 'Y');
print_r(my_array_diff($a, $b)); // B, D
benchmark
function your_array_diff($arraya, $arrayb)
{
foreach ($arraya as $keya => $valuea)
{
if (in_array($valuea, $arrayb))
{
unset($arraya[$keya]);
}
}
return $arraya;
}
$a = range(1, 10000);
$b = range(5000, 15000);
shuffle($a);
shuffle($b);
$ts = microtime(true);
my_array_diff($a, $b);
printf("ME =%.4f\n", microtime(true) - $ts);
$ts = microtime(true);
your_array_diff($a, $b);
printf("YOU=%.4f\n", microtime(true) - $ts);
result
ME =0.0137
YOU=3.6282
any questions? ;)
and, just for fun,
$ts = microtime(true);
array_diff($a, $b);
printf("PHP=%.4f\n", microtime(true) - $ts);
result
ME =0.0140
YOU=3.6706
PHP=19.5980
that's incredible!
The best solution to know how it works it to take a look at its source-code ;-)
(Well, that's one of the powers of open source -- and if you see some possible optimization, you can submit a patch ;-) )
For array_diff, it should be in ext/standard
-- which means, for PHP 5.3, it should be there : branches/PHP_5_3/ext/standard
And, then, the array.c
file looks like a plausible target ; the php_array_diff
function, line 3381, seems to correspond to array_diff
.
(Good luck going through the code : it's quite long...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With