Okay this is not a question of "how to get all uniques" or "How to remove duplicates from my array in php". This is a question about the time complexity. I figured that the <code>array_unique</code> is somewhat O(n^2 - n) and here's my implementation: <pre class="prettyprint"><code>function array_unique2($array) { $to_return = array(); $current_index = 0; for ( $i = 0 ; $i < count($array); $i++ ) { $current_is_unique = true; for ( $a = $i+1; $a < count($array); $a++ ) { if ( $array[$i] == $array[$a] ) { $current_is_unique = false; break; } } if ( $current_is_unique ) { $to_return[$current_index] = $array[$i]; } } return $to_return; } </code></pre> However when benchmarking this against the <code>array_unique</code> i got the following result: Testing (array_unique2)... Operation took 0.52146291732788 s. Testing (array_unique)... Operation took 0.28323101997375 s. Which makes the array_unique twice as fast, my question is, why ( Both had the same random data ) ? And a friend of mine wrote the following: <pre class="prettyprint"><code>function array_unique2($a) { $n = array(); foreach ($a as $k=>$v) if (!in_array($v,$n)) $n[$k]=$v; return $n; } </code></pre> which is twice as fast as the built in one in php. I'd like to know, why? What is the time-complexity of array_unique and in_array? Edit I removed the count($array) from both loops and just used a variable in the top of the function, that gained 2 seconds on 100 000 elements!

While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because: <ol> <li>He uses a single foreach loop as opposed to your double for() loop.</li> <li>Foreach loops tend to perform faster than for loops in PHP.</li> <li>He used a single if(! ) comparison while you used two if() structures</li> <li>The only additional function call your friend made was in_array whereas you called count() twice.</li> <li>You made three variable declarations that your friend didn't have to ($a, $current_is_unique, $current_index)</li> </ol> While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.

The time complexity of <code>in_array()</code> is O(n). To see this, we'll take a look at the PHP source code. The <code>in_array()</code> function is implemented in <code>ext/standard/array.c</code>. All it does is call <code>php_search_array()</code>, which contains the following loop: <pre class="prettyprint"><code>while (zend_hash_get_current_data_ex(target_hash, (void **)&entry, &pos) == SUCCESS) { // checking the value... zend_hash_move_forward_ex(target_hash, &pos); } </code></pre> That's where the linear characteristic comes from. This is the overall characteristic of the algorithm, becaus <code>zend_hash_move_forward_ex()</code> has constant behaviour: Looking at <code>Zend/zend_hash.c</code>, we see that it's basically just <pre class="prettyprint"><code>*current = (*current)->pListNext; </code></pre> <hr> As for the time complexity of <code>array_unique()</code>: <ul> <li>first, a copy of the array will be created, which is an operation with linear characteristic</li> <li>then, a C array of <code>struct bucketindex</code> will be created and pointers into our array's copy will be put into these buckets - linear characteristic again</li> <li>then, the <code>bucketindex</code>-array will be sorted usign quicksort - n <code>log</code> n on average</li> <li>and lastly, the sorted array will be walked and and duplicate entries will be removed from our array's copy - this should be linear again, assuming that deletion from our array is a constant time operation</li> </ul> Hope this helps ;)

PHP Arrays - Remove duplicates ( Time complexity )

Tags:

algorithm

time-complexity

php

Okay this is not a question of "how to get all uniques" or "How to remove duplicates from my array in php". This is a question about the time complexity.

I figured that the array_unique is somewhat O(n^2 - n) and here's my implementation:

function array_unique2($array) 
{ 
    $to_return = array(); 
    $current_index = 0;

    for ( $i = 0 ; $i < count($array); $i++ ) 
    { 
        $current_is_unique = true; 

        for ( $a = $i+1; $a < count($array); $a++ ) 
        { 
            if ( $array[$i] == $array[$a] ) 
            { 
                $current_is_unique = false; 
                break; 
            } 
        } 
        if ( $current_is_unique ) 
        { 
            $to_return[$current_index] = $array[$i];
        } 

    } 

    return $to_return; 
}

However when benchmarking this against the array_unique i got the following result:

Testing (array_unique2)... Operation took 0.52146291732788 s.

Testing (array_unique)... Operation took 0.28323101997375 s.

Which makes the array_unique twice as fast, my question is, why ( Both had the same random data ) ?

And a friend of mine wrote the following:

function array_unique2($a)
{
    $n = array();
    foreach ($a as $k=>$v)
        if (!in_array($v,$n))
            $n[$k]=$v;
    return $n;
}

which is twice as fast as the built in one in php.

I'd like to know, why?

What is the time-complexity of array_unique and in_array?

Edit I removed the count($array) from both loops and just used a variable in the top of the function, that gained 2 seconds on 100 000 elements!

831

asked Jan 25 '09 18:01

Filip Ekberg

2 Answers

While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because:

He uses a single foreach loop as opposed to your double for() loop.
Foreach loops tend to perform faster than for loops in PHP.
He used a single if(! ) comparison while you used two if() structures
The only additional function call your friend made was in_array whereas you called count() twice.
You made three variable declarations that your friend didn't have to ($a, $current_is_unique, $current_index)

While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.

answered Oct 12 '22 01:10

Noah Goodrich

The time complexity of in_array() is O(n). To see this, we'll take a look at the PHP source code.

The in_array() function is implemented in ext/standard/array.c. All it does is call php_search_array(), which contains the following loop:

while (zend_hash_get_current_data_ex(target_hash, (void **)&entry, &pos) == SUCCESS) {

    // checking the value...

    zend_hash_move_forward_ex(target_hash, &pos);
}

That's where the linear characteristic comes from.

This is the overall characteristic of the algorithm, becaus zend_hash_move_forward_ex() has constant behaviour: Looking at Zend/zend_hash.c, we see that it's basically just

*current = (*current)->pListNext;

As for the time complexity of array_unique():

first, a copy of the array will be created, which is an operation with linear characteristic
then, a C array of struct bucketindex will be created and pointers into our array's copy will be put into these buckets - linear characteristic again
then, the bucketindex-array will be sorted usign quicksort - n log n on average
and lastly, the sorted array will be walked and and duplicate entries will be removed from our array's copy - this should be linear again, assuming that deletion from our array is a constant time operation

Hope this helps ;)

answered Oct 12 '22 01:10

Christoph

Related questions
                            
                                php REQUEST_URI
                            
                                PHP new line \n and \r\n not working
                            
                                FPDI merge PDF files, strange line appears
                            
                                PHP 5.4: why can classes override trait methods with a different signature?
                            
                                How can i access repository functions in twig template in symfony2
                            
                                Libreoffice --headless refuses to convert unless root, won't work from PHP script
                            
                                Using PHPExcel to export to xlsx
                            
                                404 error after changing permalinks wordpress
                            
                                How do I encode JSON in PHP via jQuery Ajax post data?
                            
                                For Loop Table in PHP
                            
                                Assign multiple keys to same value in array
                            
                                How to let Symfony 2 adopt the protocol scheme (http vs https)
                            
                                Make column not nullable in a Laravel 5 migration
                            
                                How to print to console from a php file in wordpress
                            
                                Laravel Homestead php-7 "php5-fpm: unrecognized service" on vagrant up
                            
                                woocommerce_order_status_completed not triggered
                            
                                How to make public folder as root in Laravel?
                            
                                Delete a product by ID using PHP in WooCommerce
                            
                                Symfony 4.3: User Deprecated: The Symfony\Bundle\TwigBundle\Loader\FilesystemLoader class is deprecated since version 4.3 and will be removed in 5.0 …
                            
                                PHP's new input_filter does not read $_GET or $_POST arrays

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With