why is this insertion into a heap faster than insertion into an unsorted list?

Tags:

After inserting 100000000 elements into my heap and unsorted list, it seems that the heap insertion is actually faster (12 seconds vs 20 seconds). Why is this? I believe heap insertion is O(logn) while unsorted list insertion is O(1). I also noticed that my heap insertion implementation doesn't actually scale with the number of inputs. This also confuses me.

Here is the code that I ran:

int main ()
{
    clock_t unsortedStart;
    clock_t heapStart;

    double unsortedDuration;
    double heapDuration;

    int num_pushes = 100000000;
    int interval = 10000;

    ofstream unsorted ("unsorted.txt");
    ofstream heap ("heap.txt");

    UnsortedPQ<int> unsortedPQ; 
    HeapPQ<int> heapPQ; 

    unsortedStart = clock();

    for (int i = 0; i < num_pushes; ++i)
    {
        if (i % interval == 0) {
            unsortedDuration = ( clock() - unsortedStart ) / (double) CLOCKS_PER_SEC;
            unsorted << unsortedDuration << " " << i << endl;
        }

        unsortedPQ.insertItem(rand() % 100);
    }

    heapStart = clock();
    for (int i = 0; i < num_pushes; ++i)
    {
        if (i % interval == 0) {
            heapDuration = ( clock() - heapStart ) / (double) CLOCKS_PER_SEC;
            heap << heapDuration << " " << i << endl;
        }
        heapPQ.insertItem(rand() % 100);
    }
    return 0;
}

This is the heap implementation of insert (uses std::vector):

template <class T>
void HeapPQ<T>::insertItem(T data) { 
    //insert into back of heap (std::vector)
    dataArray.push_back(data);
    int i = dataArray.size() - 1;

    //sifts the inserted element up
    while (i != 0 && dataArray[(i - 1) / 2] > dataArray[i]) {
        swap(dataArray[i], dataArray[(i - 1) / 2]);
        i = (i - 1) / 2;
    }
}

This is the unsorted list implementation of insert (uses std::list):

//pushes element to the back of a std::list
template <class T>
void UnsortedPQ<T>::insertItem(T data) { dataList.push_back(data); }

507

asked Nov 13 '17 04:11

everett

1 Answers

The insertion into the heap is O(logn), that mean every insertion could take at most O(logn) steps. It does not mean it has to.

In your example average cost of inserting an element is O(1). Why that?

For simplicity, let's assume you insert only 0a and 1s in a random order (in your current version only numbers 0..99 (rand() % 100) are inserted - the calculation is more complex, but the behavior stays the same). After 2*n elements are inserted, there would be about n 0s and n 1s in the heap, and the heap would look as follows:

                                 0
                                0 0
                               00 00
                          ...............
                         0 0 0  0  0  0  0
                       11 11 11 11 11 11 11

So basically, 1s are all at the last level k and 0s are at levels 0..k-1.

if 1 is inserted, there is nothing to do (there are no 2s above).
if 0 is inserted there is at most one swap (1s may be in the level above the last level, but 2 levels above).

That meas in average we need only 0.5 swaps and not k.

Having the same asymptotic running time, it is all down to the (amortized) costs for inserting in a vector and in a list. The list seems to be slower (my assumption would be, that for every insert it needs to allocate an element on the heap via new and this is a quite slow operation. The costs depend on other factors, e.g. the size of the inserted objects, and thus it may vary which one is faster).

Let's take a closer look at your case, where the numbers are generated by a uniform dstribution [0..99]. After n>>100 insertions we will have the following situation (there is some hand-waving involved, but the gist should be clear):

the last level (k-th) of the heap has n/2 elements and consists of numbers 50..99. So for 50% of possible numbers (i.e. 50..99) no shift is needed.
the second last level (k-1-th) of the heap has n/4 elements and consists of numbers 25..49. That means for 25% of possible numbers exactly 1 shift is needed.
the level k-2 has n/8 elements and consists of numbers 13..24.
The levels above log 100/log 2 have only 0s inside. So the maximal number of shifts possible is m=log 100/log 2, independent of n - the number of elements in the heap.

So worst case costs for the insertion would be log 100/log 2, the average costs are even smaller:

E(insertion)=0*1/2+1*1/4+2*1/8+...<=1.0

i.e. on average we have less than 1 shift per insertion.

NB: It does not mean, that inserting in the heap has amortized costs of O(1) - if you would insert the numbers not in random order, but first all 99s, then 98s, ..., then 0s you would have costs of O(log n) per insertion.

178

answered Oct 23 '22 05:10

ead

Related questions
                            
                                copy & copy_if vs remove_copy & remove_copy_if
                            
                                Comparison for objects derived from std::string_view is ambiguous in MSVC
                            
                                C++11 default class member initialization with initializer list , simultaneously
                            
                                Efficient (on Ryzen) way to extract the odd elements of a __m256 into a __m128?
                            
                                C++ Preprocessor doesn't process define as expected
                            
                                Convert time-string to std::time_t using std::get_time: wrong result
                            
                                Set a scale for a background-image
                            
                                Code optimization subset sum
                            
                                Trivial default constructor can't be constexpr?
                            
                                If std::greater<>, then why std::less (and not std::lesser<>)?
                            
                                Why can I pass a value-accepting callable to a reference-accepting std::function?
                            
                                Fixed point combinator and explicit result type
                            
                                private static const member variable in header vs const variable in cpp
                            
                                Recording Video in UYVY codec in Opencv
                            
                                Difference between fstream, ofstream, ostream, iostream
                            
                                Tail call optimisation seems to slightly worsen performance
                            
                                Adding ToolTip to a ComboBoxEx fails
                            
                                How to measure the correct size of non-ASCII characters?
                            
                                Aggregate initialization does not uphold constructor access [duplicate]
                            
                                Create std::vector in-place from raw data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why is this insertion into a heap faster than insertion into an unsorted list?

Tags:

c++

list

data-structures

heap

everett

People also ask

1 Answers

ead

Recent Activity

Donate For Us