Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP in_array() horrible performance. Fatest way to search array for value

Tags:

arrays

php

I have the following simple code to test against collision on a primary key I am creating:

$machine_ids = array();

for($i = 0; $i < 100000; $i++) {
    //Generate machine id returns a 15 character alphanumeric string
    $mid = Functions::generate_machine_id();

    if(in_array($mid, $machine_ids)) {
        die("Collision!");
    } else {
        $machine_ids[] = $mid;  
    }
}

die("Success!");

Any idea why this is taking many minutes to run? Anyway to speed it up?

like image 719
Justin Avatar asked May 23 '11 07:05

Justin


2 Answers

If you need best performance for your case, you need store your data as array key and use isset or array_key_exists(since php >= 7.4 array_key_exists is now as fast as isset) instead in_array.

Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind that converting an array of values, ["foo", "bar", "baz"], to a hash map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory cost (as well as potentially constructing the hash map, depending on how and when you do it). As with all things, you'll have to weigh the pros & cons for each case to determine if a hash map or array (list) of values works best for your needs. This isn't specific to PHP but more of a general problem space of computer science.

And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a

<?php declare(strict_types = 1);

function testPerformance($name, Closure $closure, $runs = 1000000)
{
    $start = microtime(true);
    for (; $runs > 0; $runs--)
    {
        $closure();
    }
    $end = microtime(true);

    printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}

$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
    $items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);

$assocItems = array_combine($items, array_fill(0, count($items), true));

$in_array = function () use ($items) {
    in_array(1111111, $items);
};

$isset = function () use ($assocItems) {
    isset($items[1111111]);
};

$array_key_exists = function () use ($assocItems) {
    array_key_exists(1111111, $assocItems);
};

testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);

Output:

Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds
like image 129
Arsen Avatar answered Sep 28 '22 12:09

Arsen


for($i = 0; $i < 100000; $i++) 
{
  //Generate machine id returns a 15 character alphanumeric string
  $mid = Functions::generate_machine_id();
  if (isset($machine_ids[$mid]))
  {
    die("Collision!");
  }
  $machine_ids[$mid] = true;
}
like image 30
ajreal Avatar answered Sep 28 '22 12:09

ajreal