Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

handling large arrays with array_diff

Tags:

php

I have been trying to compare two arrays. Using array_intersect presents no problems. When using array_diff and arrays with ~5,000 values, it works. When I get to ~10,000 values, the script dies when I get to array_diff. Turning on error_reporting did not produce anything.

I tried creating my own array_diff function:

function manual_array_diff($arraya, $arrayb) {
    foreach ($arraya as $keya => $valuea) {
        if (in_array($valuea, $arrayb)) {
            unset($arraya[$keya]);
        }
    }
    return $arraya;
}

source: How does array_diff work?

I would expect it to be less efficient that than the official array_diff, but it can handle arrays of ~10,000. Unfortunately, both array_diffs fail when I get to ~15,000.

I tried the same code on a different machine and it runs fine, so it's not an issue with the code or PHP. There must be some limit set somewhere on that particular server. Any idea how I can get around that limit or alter it or just find out what it is?

like image 427
burger Avatar asked Jun 06 '10 20:06

burger


3 Answers

Having encountered the exact same problem, I was really hoping for an answer here.

So, I had to find my own way around it and came up with the following ugly kludge that is working for me with arrays of around 50,000 elements. It is based on your observation that array_intersect works but array_diff doesn't.

Sooner or later this will also overflow the resource limitations, in which case it will be necessary to chunk the arrays and deal with smaller bits. We will cross that bridge when we come to it.

function new_array_diff($arraya, $arrayb) {
    $intersection = array_intersect($arraya, $arrayb);
    foreach ($arraya as $keya => $valuea) {
        if (!isset($intersection[$keya])) {
            $diff[$keya] = $valuea;
        }
    }

    return $diff;
}
like image 52
GeoNomad Avatar answered Nov 13 '22 20:11

GeoNomad


In my php.ini:

max_execution_time = 60     ; Maximum execution time of each script, in seconds
memory_limit = 32M          ; Maximum amount of memory a script may consume

Could differences in these setting or alternatively in machine performance be causing the problems? Did you check your web server error logs (if you run this through one)?

like image 25
Lauri Lehtinen Avatar answered Nov 13 '22 19:11

Lauri Lehtinen


You mentioned this is running in a browser. Try running the script via command line and see if the result is different.

like image 1
tipu Avatar answered Nov 13 '22 20:11

tipu