I have a doubt about what's the better way to make a fast search in arrays (I'm talking about an specific case).
Supose that I have an array L = [A, B, C] (when I start). While the program is running, may be L will grow (but by the end), one possible case when I'll do the search is that L = [A, B, C, D, E].
The fact is that when I'm searching, the values that I want find could be only D and E. Now I'm using find_array(elem, array), but this function can't be "tweaked" to search starting at the end and decreasing the index, and I'm "afraid" that for all the searches the function in_array will examine all the elements with lower indexes before will find the value that I'm searching.
¿There is another search function wich fits better to my problem? ¿How works internally the in_array function?
Thanks in advance
I assume that in_array
is a linear search from 0 to n-1.
The fastest search will be to store the values as the keys and use array_key_exists
.
$a['foo'] = true;
$a['bar'] = true;
if (array_key_exists('foo', $a)) ...
But if that's not an option, you can make your own for indexed arrays quite easily:
function in_array_i($needle, array $a, $i = 0);
{
$c = count($a);
for (;$i < $c; ++$i)
if ($a[$i] == $needle) return true;
return false;
}
It will start at $i
, which you can keep track of yourself in order to skip the first elements.
Or alternatively...
function in_array_i($needle, array $a, $i = 0);
{
return in_array($needle, $i ? array_slice($a, $i) : $a);
}
You can benchmark to see which is faster.
How works internally the in_array function?
Internally the in_array()
searches from the beginning to the end of the array. So in your case this is slow.
Depending of the nature of your data you can change the search strategy. If you only have non-duplicate values and all values are either string or integer (not NULL
), a common trick is to array_flip()
the array which works pretty fast and then check if there is an entry for your value as key in the array hash via isset()
:
$array = array( ... non-duplicate string and integer values ... );
$needle = 'find me!';
$lookup = array_flip($array);
$found = isset($lookup[$needle]) ? $lookup[$needle] : false;
if (false === $found) {
echo "Not found!\n";
} else {
echo "Found at {$found}!\n";
}
If these pre-conditions are not met, you can do that what konforce suggested.
If you have really much data and it's not only that you're looking at either from the beginning or end, you might want to implement one search algorithm on your own, like neither starting from the beginning nor end, but wrapping and/or starting at a random position to distribute the search time.
Additionally you can keep elements sorted while adding to the array probably which can then be searched much faster with a fitting algorithm.
Tweaking an extensive comparative test between
for numerical and string searches, by Kasim Kochkin posted on GitHub, I find the following results
using php 7.3.11
using array_flip once and multiple searches,
for single to few searches, in_array and array_search are faster.
for string searches, flip (once) + isset becomes faster above 200 searches.
for numerical searches, flip (once) + isset becomes faster above 10 searches.
results for String search (in seconds)
N (array size) | in_array | flip | isset | array_search | array_key_exists |
---|---|---|---|---|---|
1,000,000 | 0.00845003 | 0.17343211 | 2.86E-6 | 0.00835395 | 5.01E-6 |
100,000 | 0.00854707 | 0.12469196 | 7.15E-6 | 0.00861216 | 6.2E-6 |
10,000 | 0.00854087 | 0.10549212 | 6.91E-6 | 0.00846505 | 4.05E-6 |
Numerical search results (in seconds),
N (array size) | in_array | flip | isset | array_search | array_key_exists |
---|---|---|---|---|---|
1,000,000 | 0.01197696 | 0.06217289 | 6.2E-6 | 0.01673698 | 4.05E-6 |
100,000 | 0.01191092 | 0.06582093 | 6.91E-6 | 0.01637983 | 4.05E-6 |
10,000 | 0.01375008 | 0.07185006 | 5.01E-6 | 0.01485705 | 4.05E-6 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With