Why is my n log(n) heapsort slower than my n^2 selection sort

Question

I have implemented two algorithms for sorting elements from highest to lowest.

The first one, takes quadratic time on the real RAM model and the second an O(n log(n)) time. The second one uses priority queues to get the reduction.

Here are the timings, which are the output of the above program.

the first column is the size of the random array of integers
the second column is the time in seconds for the O(n^2) technique

the third column is the time in seconds for the O(n log(n)) technique

 9600  1.92663      7.58865
 9800  1.93705      7.67376
10000  2.08647      8.19094

In spite of this great difference in complexity, the 3rd column is larger than the second for the array sizes considered. Why is this so? Is the priority queue implementation of C++ slow?

I executed this code on Windows 7, Visual Studio 2012 32-bit.

Here is the code,

#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <queue>
#include <Windows.h>
#include <assert.h>

using namespace std;

double time_slower_sort(vector<int>& a) 
{
    LARGE_INTEGER frequency, start,end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE  ) exit(0);
    if (::QueryPerformanceCounter(&start)       == FALSE  ) exit(0);

    for(size_t i=0 ; i < a.size() ; ++i)  
    {

        vector<int>::iterator it = max_element( a.begin() + i ,a.end() ) ;
        int max_value = *it; 
        *it = a[i];
        a[i] = max_value;    

    }
    if (::QueryPerformanceCounter(&end) == FALSE) exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;
}



double time_faster_sort(vector<int>& a) 
{
    LARGE_INTEGER frequency, start,end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE  ) exit(0);
    if (::QueryPerformanceCounter(&start)       == FALSE  ) exit(0);


    // Push into the priority queue. Logarithmic cost per insertion = > O (n log(n)) total insertion cost
    priority_queue<int> pq;
    for(size_t i=0 ; i<a.size() ; ++i)
    {
        pq.push(a[i]);
    }

    // Read of the elements from the priority queue in order of priority
    // logarithmic reading cost per read => O(n log(n)) reading cost for entire vector
    for(size_t i=0 ; i<a.size() ; ++i)
    {
        a[i] = pq.top();
        pq.pop();
    }
    if (::QueryPerformanceCounter(&end) == FALSE) exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;

}




int main(int argc, char** argv)
{
    // Iterate over vectors of different sizes and try out the two different variants
    for(size_t N=1000; N<=10000 ; N += 100 ) 
    {

        // initialize two vectors with identical random elements
        vector<int> a(N),b(N);

        // initialize with random elements
        for(size_t i=0 ; i<N ; ++i) 
        {
            a[i] = rand() % 1000; 
            b[i] = a[i];
        }

        // Sort the two different variants and time them  
        cout << N << "  " 
             << time_slower_sort(a) << "		" 
             << time_faster_sort(b) << endl;

        // Sanity check
        for(size_t i=0 ; i<=N-2 ; ++i) 
        {
            assert(a[i] == b[i]); // both should return the same answer
            assert(a[i] >= a[i+1]); // else not sorted
        }

    }
    return 0;
}

LSerni · Accepted Answer

the 3rd column is larger than the second for the array sizes considered.

The "Big O" notation only tells you how the time grows with the input size.

Your times are (or should be)

A + B*N^2          for the quadratic case,
C + D*N*LOG(N)     for the linearithmic case.

But it is entirely possible that C is much larger than A, leading to a higher execution time for the linearithmic code when N is small enough.

What makes linearithmicity interesting is that if your input grew from 9600 to 19200 (doubling), your execution time should approximately quadruple, going to about eight seconds, for the quadratic algorithm, while the linearithmic algorithm should little more than double its execution time.

So the execution time ratio would go from 2:8 to 8:16, i.e. the quadratic algorithm is now only twice as fast.

Double again the size of the input and 8:16 becomes 32:32; the two algorithms are equally fast when confronted with an input of around 40,000.

When tackling an input size of 80,000, the ratio is reversed: four times 32 is 128, while twice 32 is only 64. 128:64 means that the linearithmic algorithm is now twice as fast as the other.

You should run tests with very different sizes, maybe N, 2*N and 4*N, in order to get a better estimate of your A, B, C and D constants.

What this all boils down to is, do not blindly rely on the Big O classification. Use it if you expect your input to grow large; but for small inputs, it may well be that a less scalable algorithm turns out to be more efficient.

For example here you see that for small input sizes, the faster algorithm is the one running in exponential time, which is hundreds of times faster than the logarithmic one. But as soon as the input size grows past nine, the exponential algorithm's running time skyrockets, while the other doesn't.

You might even decide to implement both versions of an algorithm, and use the one or the other depending on input size. There are some recursive algorithms that do exactly this, and switch to iterative implementations for the last iterations. In the pictured case, you could implement the best algorithm for each size range; but the best compromise is go with two algorithms only, the quadratic one until N=15, and switch to logarithmic afterwards.

I found here a reference to Introsort, which

is a sorting algorithm that initially uses Quicksort, but switches to Heapsort when the recursion depth exceeds a level based on the logarithm of the number of elements being sorted, and uses Insertion sort for small cases because of its good locality of reference, i.e. when the data most likely resides in memory and easily referenced.

In the above case, the Insertion sort exploits memory locality, which means that its B constant is very small; a recursive algorithm is likely to incur higher costs, and have a significant value of C. So for small datasets, more compact algorithms do well even if their Big O classification is poorer.

NetVipeC · Answer

I think the problem is really more subtle that one expected. In your O(N^2) solution you are making no allocation, the algorithm work in place, search the biggest and swap with the current position. This is OK.

But in the priority_queue version O(N log N) (the priority_queue in the internal, have a std::vector by default, to storage the state). This vector when you push_back element by element sometimes it need to grow (and it does) but this is time you are no losing in the O(N^2) version. If you make the following little change in the initialization of the priority_queue:

priority_queue<int> pq(a.begin(), a.end()); instead of the for loop

The time of the O(N log N) beat the O(N^2) as it should by a fair amount. In the proposed change there is still allocation in the priority_queue version, but is only one time (you are saving a lot of allocation for big vector sizes, and allocation is one of the important time consuming operation) and maybe the initialization (in O(N) could take advantage of the complete state of the priority_queue, don't know if the STL really does this).

Sample Code (for compile and run):

#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <queue>
#include <Windows.h>
#include <assert.h>

using namespace std;

double time_slower_sort(vector<int>& a) {
    LARGE_INTEGER frequency, start, end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE)
        exit(0);
    if (::QueryPerformanceCounter(&start) == FALSE)
        exit(0);

    for (size_t i = 0; i < a.size(); ++i) {

        vector<int>::iterator it = max_element(a.begin() + i, a.end());
        int max_value = *it;
        *it = a[i];
        a[i] = max_value;
    }
    if (::QueryPerformanceCounter(&end) == FALSE)
        exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) /
           frequency.QuadPart;
}

double time_faster_sort(vector<int>& a) {
    LARGE_INTEGER frequency, start, end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE)
        exit(0);
    if (::QueryPerformanceCounter(&start) == FALSE)
        exit(0);

    // Push into the priority queue. Logarithmic cost per insertion = > O (n
    // log(n)) total insertion cost
    priority_queue<int> pq(a.begin(), a.end());  // <----- THE ONLY CHANGE IS HERE

    // Read of the elements from the priority queue in order of priority
    // logarithmic reading cost per read => O(n log(n)) reading cost for entire
    // vector
    for (size_t i = 0; i < a.size(); ++i) {
        a[i] = pq.top();
        pq.pop();
    }
    if (::QueryPerformanceCounter(&end) == FALSE)
        exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) /
           frequency.QuadPart;
}

int main(int argc, char** argv) {
    // Iterate over vectors of different sizes and try out the two different
    // variants
    for (size_t N = 1000; N <= 10000; N += 100) {

        // initialize two vectors with identical random elements
        vector<int> a(N), b(N);

        // initialize with random elements
        for (size_t i = 0; i < N; ++i) {
            a[i] = rand() % 1000;
            b[i] = a[i];
        }

        // Sort the two different variants and time them
        cout << N << "  " << time_slower_sort(a) << "		"
             << time_faster_sort(b) << endl;

        // Sanity check
        for (size_t i = 0; i <= N - 2; ++i) {
            assert(a[i] == b[i]);     // both should return the same answer
            assert(a[i] >= a[i + 1]); // else not sorted
        }
    }
    return 0;
}

In my PC (Core 2 Duo 6300) the output obtained is:

1100  0.000753738      0.000110263
1200  0.000883201      0.000115749
1300  0.00103077       0.000124526
1400  0.00126994       0.000250698
...
9500  0.0497966        0.00114377
9600  0.051173         0.00123429
9700  0.052551         0.00115804
9800  0.0533245        0.00117614
9900  0.0555007        0.00119205
10000 0.0552341        0.00120466

Why is my n log(n) heapsort slower than my n^2 selection sort

Tags:

c++

sorting

selection-sort

heapsort

smilingbuddha

2 Answers

LSerni

NetVipeC

Recent Activity

Donate For Us

Why is my n log(n) heapsort slower than my n^2 selection sort

Tags:

c++

sorting

selection-sort

heapsort

smilingbuddha

2 Answers

LSerni

NetVipeC

Related questions

Recent Activity

Donate For Us