Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is this insertion into a heap faster than insertion into an unsorted list?

After inserting 100000000 elements into my heap and unsorted list, it seems that the heap insertion is actually faster (12 seconds vs 20 seconds). Why is this? I believe heap insertion is O(logn) while unsorted list insertion is O(1). I also noticed that my heap insertion implementation doesn't actually scale with the number of inputs. This also confuses me.

Here is the code that I ran:

int main ()
{
    clock_t unsortedStart;
    clock_t heapStart;

    double unsortedDuration;
    double heapDuration;

    int num_pushes = 100000000;
    int interval = 10000;

    ofstream unsorted ("unsorted.txt");
    ofstream heap ("heap.txt");

    UnsortedPQ<int> unsortedPQ; 
    HeapPQ<int> heapPQ; 

    unsortedStart = clock();

    for (int i = 0; i < num_pushes; ++i)
    {
        if (i % interval == 0) {
            unsortedDuration = ( clock() - unsortedStart ) / (double) CLOCKS_PER_SEC;
            unsorted << unsortedDuration << " " << i << endl;
        }

        unsortedPQ.insertItem(rand() % 100);
    }

    heapStart = clock();
    for (int i = 0; i < num_pushes; ++i)
    {
        if (i % interval == 0) {
            heapDuration = ( clock() - heapStart ) / (double) CLOCKS_PER_SEC;
            heap << heapDuration << " " << i << endl;
        }
        heapPQ.insertItem(rand() % 100);
    }
    return 0;
}

This is the heap implementation of insert (uses std::vector):

template <class T>
void HeapPQ<T>::insertItem(T data) { 
    //insert into back of heap (std::vector)
    dataArray.push_back(data);
    int i = dataArray.size() - 1;

    //sifts the inserted element up
    while (i != 0 && dataArray[(i - 1) / 2] > dataArray[i]) {
        swap(dataArray[i], dataArray[(i - 1) / 2]);
        i = (i - 1) / 2;
    }
}

This is the unsorted list implementation of insert (uses std::list):

//pushes element to the back of a std::list
template <class T>
void UnsortedPQ<T>::insertItem(T data) { dataList.push_back(data); }
like image 507
everett Avatar asked Nov 13 '17 04:11

everett


People also ask

Which is faster bubble or heap sort or insertion sort?

But in all case Insertion sort is very much faster compared to bubble and heap sort. Theoretically heap sort is supposed to be the best in case of worst scenario. Please find the below test result when I used 100000 as the input for a worst case scenario.

How to insert an element to the heap?

Process of Insertion: Elements can be inserted to the heap following a similar approach as discussed above for deletion. The idea is to: First increase the heap size by 1, so that it can store the new element. Insert the new element at the end of the Heap. This newly inserted element may distort the properties of Heap for its parents.

Is it faster to sort an array or a heap?

Using a heap to find the smallest element is definitely a lot faster than sorting an array. Two heaps for the smallest and largest element are still a lot faster (but that situation is quite rare; for example in a horse race everyone wants to know the winner, but nobody cares who comes last).

What is the standard deletion operation on heap?

The standard deletion operation on Heap is to delete the element present at the root node of the Heap. That is if it is a Max Heap, the standard deletion operation will delete the maximum element and if it is a Min heap, it will delete the minimum element. Since deleting an element at any intermediary position in the heap can be costly, ...


1 Answers

The insertion into the heap is O(logn), that mean every insertion could take at most O(logn) steps. It does not mean it has to.

In your example average cost of inserting an element is O(1). Why that?

For simplicity, let's assume you insert only 0a and 1s in a random order (in your current version only numbers 0..99 (rand() % 100) are inserted - the calculation is more complex, but the behavior stays the same). After 2*n elements are inserted, there would be about n 0s and n 1s in the heap, and the heap would look as follows:

                                 0
                                0 0
                               00 00
                          ...............
                         0 0 0  0  0  0  0
                       11 11 11 11 11 11 11

So basically, 1s are all at the last level k and 0s are at levels 0..k-1.

  1. if 1 is inserted, there is nothing to do (there are no 2s above).
  2. if 0 is inserted there is at most one swap (1s may be in the level above the last level, but 2 levels above).

That meas in average we need only 0.5 swaps and not k.

Having the same asymptotic running time, it is all down to the (amortized) costs for inserting in a vector and in a list. The list seems to be slower (my assumption would be, that for every insert it needs to allocate an element on the heap via new and this is a quite slow operation. The costs depend on other factors, e.g. the size of the inserted objects, and thus it may vary which one is faster).


Let's take a closer look at your case, where the numbers are generated by a uniform dstribution [0..99]. After n>>100 insertions we will have the following situation (there is some hand-waving involved, but the gist should be clear):

  1. the last level (k-th) of the heap has n/2 elements and consists of numbers 50..99. So for 50% of possible numbers (i.e. 50..99) no shift is needed.
  2. the second last level (k-1-th) of the heap has n/4 elements and consists of numbers 25..49. That means for 25% of possible numbers exactly 1 shift is needed.
  3. the level k-2 has n/8 elements and consists of numbers 13..24.
  4. The levels above log 100/log 2 have only 0s inside. So the maximal number of shifts possible is m=log 100/log 2, independent of n - the number of elements in the heap.

So worst case costs for the insertion would be log 100/log 2, the average costs are even smaller:

E(insertion)=0*1/2+1*1/4+2*1/8+...<=1.0

i.e. on average we have less than 1 shift per insertion.

NB: It does not mean, that inserting in the heap has amortized costs of O(1) - if you would insert the numbers not in random order, but first all 99s, then 98s, ..., then 0s you would have costs of O(log n) per insertion.

like image 178
ead Avatar answered Oct 23 '22 05:10

ead