Okay this is not a question of "how to get all uniques" or "How to remove duplicates from my array in php". This is a question about the time complexity.
I figured that the array_unique
is somewhat O(n^2 - n) and here's my implementation:
function array_unique2($array)
{
$to_return = array();
$current_index = 0;
for ( $i = 0 ; $i < count($array); $i++ )
{
$current_is_unique = true;
for ( $a = $i+1; $a < count($array); $a++ )
{
if ( $array[$i] == $array[$a] )
{
$current_is_unique = false;
break;
}
}
if ( $current_is_unique )
{
$to_return[$current_index] = $array[$i];
}
}
return $to_return;
}
However when benchmarking this against the array_unique
i got the following result:
Testing (array_unique2)... Operation took 0.52146291732788 s.
Testing (array_unique)... Operation took 0.28323101997375 s.
Which makes the array_unique twice as fast, my question is, why ( Both had the same random data ) ?
And a friend of mine wrote the following:
function array_unique2($a)
{
$n = array();
foreach ($a as $k=>$v)
if (!in_array($v,$n))
$n[$k]=$v;
return $n;
}
which is twice as fast as the built in one in php.
I'd like to know, why?
What is the time-complexity of array_unique and in_array?
Edit I removed the count($array) from both loops and just used a variable in the top of the function, that gained 2 seconds on 100 000 elements!
The array_unique() function removes duplicate values from an array. If two or more array values are the same, the first appearance will be kept and the other will be removed.
An array is defined and duplicate elements from the array can be found and removed using the 'array_flip' function, that basically reverses the keys/index as values and values as keys.
We can remove duplicate element in an array by 2 ways: using temporary array or using separate index. To remove the duplicate element from array, the array must be in sorted order. If array is not sorted, you can sort it by calling Arrays.sort(arr) method.
While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because:
While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.
The time complexity of in_array()
is O(n). To see this, we'll take a look at the PHP source code.
The in_array()
function is implemented in ext/standard/array.c
. All it does is call php_search_array()
, which contains the following loop:
while (zend_hash_get_current_data_ex(target_hash, (void **)&entry, &pos) == SUCCESS) {
// checking the value...
zend_hash_move_forward_ex(target_hash, &pos);
}
That's where the linear characteristic comes from.
This is the overall characteristic of the algorithm, becaus zend_hash_move_forward_ex()
has constant behaviour: Looking at Zend/zend_hash.c
, we see that it's basically just
*current = (*current)->pListNext;
As for the time complexity of array_unique()
:
struct bucketindex
will be created and pointers into our array's copy will be put into these buckets - linear characteristic againbucketindex
-array will be sorted usign quicksort - n log
n on averageHope this helps ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With