in a very tight loop I need to access tens of thousands of values in an array containing millions of elements. The key can be undefined: In that case it shall be legal to return NULL without any error message:
Array key exists: return value of element. Array key does not exist: return null.
I do know multiple solutions:
if (isset($lookup_table[$key])) { return $lookup_table[$key]; } else { return; }
or
@return $lookup_table[$key];
or
error_reporting(0); $return = $lookup_table[$key]; error_reporting(E_ALL); return $return;
All solutions are far from optimal:
My question is if I miss a way to avoid error handling and yet work with a single Btree lookup?
The array caches the results of a complex calculation - to complex to be done in real time. Out of billions of possible values, only millions yield a valid result. The array looks like 1234567 => 23457, 1234999 => 74361, .... That is saved to a PHP file of several megabyte, and include_once-d at the beginning of the execution. Initial load time does not matter. If the key is not found, it simply means that this specific value will not return a valid result. The trouble is to get this done 50k+ per second.
As there is no way found to get the value with a single lookup and without error handling, I have trouble accepting a single answer. Instead I upvoted all the great contributions.
The most valuable inputs where:
There was a lot of confusion on how PHP handles arrays. If you check the source code, you will see that all arrays are balanced trees. Building own lookup methods is common in C and C++, but is not performant in higher script-languages like PHP.
The array_keys() function returns an array containing the keys.
PHP array_key_exists() Function The array_key_exists() function checks an array for a specified key, and returns true if the key exists and false if the key does not exist.
PHP lets you create 2 types of array: Indexed arrays have numeric indices. Typically the indices in an indexed array start from zero, so the first element has an index of 0 , the second has an index of 1 , and so on. Usually, you use an indexed array when you want to store a bunch of data in a certain order.
Since PHP 7 you can accomplish this with the null coalesce operator:
return $table[$key] ?? null;
First of all, arrays are not implemented as a B-tree, it's a hash table; an array of buckets (indexed via a hash function), each with a linked list of actual values (in case of hash collisions). This means that lookup times depend on how well the hash function has "spread" the values across the buckets, i.e. the number of hash collisions is an important factor.
Technically, this statement is the most correct:
return array_key_exists($key, $table) ? $table[$key] : null;
This introduces a function call and is therefore much slower than the optimized isset()
. How much? ~2e3 times slower.
Next up is using a reference to avoid the second lookup:
$tmp = &$lookup_table[$key]; return isset($tmp) ? $tmp : null;
Unfortunately, this modifies the original $lookup_table
array if the item does not exist, because references are always made valid by PHP.
That leaves the following method, which is much like your own:
return isset($lookup_table[$key]) ? $lookup_table[$key] : null;
Besides not having the side effect of references, it's also faster in runtime, even when performing the lookup twice.
You could look into dividing your arrays into smaller pieces as one way to mitigate long lookup times.
I did some bench marking with the following code:
set_time_limit(100); $count = 2500000; $search_index_end = $count * 1.5; $search_index_start = $count * .5; $array = array(); for ($i = 0; $i < $count; $i++) $array[md5($i)] = $i; $start = microtime(true); for ($i = $search_index_start; $i < $search_index_end; $i++) { $key = md5($i); $test = isset($array[$key]) ? $array[$key] : null; } $end = microtime(true); echo ($end - $start) . " seconds<br/>"; $start = microtime(true); for ($i = $search_index_start; $i < $search_index_end; $i++) { $key = md5($i); $test = array_key_exists($key, $array) ? $array[$key] : null; } $end = microtime(true); echo ($end - $start) . " seconds<br/>"; $start = microtime(true); for ($i = $search_index_start; $i < $search_index_end; $i++) { $key = md5($i); $test = @$array[$key]; } $end = microtime(true); echo ($end - $start) . " seconds<br/>"; $error_reporting = error_reporting(); error_reporting(0); $start = microtime(true); for ($i = $search_index_start; $i < $search_index_end; $i++) { $key = md5($i); $test = $array[$key]; } $end = microtime(true); echo ($end - $start) . " seconds<br/>"; error_reporting($error_reporting); $start = microtime(true); for ($i = $search_index_start; $i < $search_index_end; $i++) { $key = md5($i); $tmp = &$array[$key]; $test = isset($tmp) ? $tmp : null; } $end = microtime(true); echo ($end - $start) . " seconds<br/>";
and I found that the fastest running test was the one that uses isset($array[$key]) ? $array[$key] : null
followed closely by the solution that just disables error reporting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With