Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a heap with heapify vs heappush. Which one is faster?

Question

I have to create a priority queue storing distances. To build the heap I am thinking about the following two possibilities:

from heapq import heapify, heappush
n = 35000  # input size

# way A: using heapify
dist = []
for i in range(n):
  dist.push(distance)  # distance is computed in O(1) time
heapify(dist)

# way B: using heappush
dist = []
for i in range(n):
  heappush(dist, distance)  # distance is computed in O(1) time

Which one is faster?

Reasoning

According to the docs heapify() runs in linear time, and I'm guessing heappush() runs in O(log n) time. Therefore, the running time for each way would be:

  • A: O(2n) = O(n)
  • B: O(n log n)

However, it is counter intuitive for me that A is faster than B. Am I missing something? is it A really faster than B?

**EDIT

I've been testing with different inputs and different sizes of the array, and I am still not sure which one is faster.

After reading the link of the comment by Elisha, I understand how heapify() runs in linear time. However, I still don't know if using heappush() could be faster depending on the input.

I mean, heappush() has a worst case running time of O(log n), but in average will probably be smaller, depending on the input. Its best case running time is actually O(1). In the other hand heapify() has a best case running time of O(n), and must be called after filling the array, which takes also O(n). That makes a best case of O(2n).

So heappush() could be as fast as linear or as slow as O(n log n), whereas heapify() is going to take 2n time in any case. If we look at the worst case, heapify() will be better. But what about an average case?

Can we even be sure that one be faster than te other?

like image 916
Daniel Reina Avatar asked Feb 12 '17 11:02

Daniel Reina


1 Answers

Yes, we can be certain that one is faster than the other.

heap.push builds the heap from the bottom up. Each item is added to the end of the array and then "bubbled up" to its correct position. If you were building a min heap and you presented the items in reverse order, then every item you inserted would require log(n) (n being the current size of the heap) comparisons. So the worst case for building a heap by insertion is O(n log n).

Imagine starting with an empty heap and adding 127 items in reverse order (i.e. 127, 126, 125, 124, etc.). Each new item is smaller than all the other items, so every item will require the maximum number of swaps to bubble up from the last position to the top. The first item that’s added makes zero swaps. The next two items make one swap each. The next four items make two swaps each. Eight items make three swaps. 16 items make four swaps. 32 items make five swaps, and 64 items make six swaps.It works out to:

0 + 2*1 + 4*2 + 8*3 + 16*4 + 32*5 + 64*6
0 + 2 + 8 + 24 + 64 + 160 + 384 = 642 swaps

The worst case for build-heap is n swaps. Consider that same array of 127 items. The leaf level contains 64 nodes. build-heap starts at the halfway point and works its way backwards, moving things down as required. The next-to-last level has 32 nodes that at worst will move down one level. The next level up has 16 nodes that can't move down more than two levels. If you add it up, you get:

64*0 + 32*1 + 16*2 + 8*3 + 4*4 + 2*5 + 1*6
0 + 32 + 32 + 24 + 16 + 10 + 6 = 120 swaps

That's the absolute worst case for build-heap. It's O(n).

If you profile those two algorithms on an array of, say, a million items, you'll see a huge difference in the running time, with build-heap being much faster.

like image 198
Jim Mischel Avatar answered Oct 01 '22 01:10

Jim Mischel