Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the runtime of building a heap by inserting elements worse than using heapify?

In the CLRS book, building a heap by top-down heapify has the complexity O(n). A heap can also be built by repeatedly calling insertion, which has the complexity nlg(n) in the worst case.

My question is: is there any insight why the latter method has the worse performance?

I asked this question since I feel there are simple insights behind the math. For example,

quicksort, merge sort, and heapsort are all based on reducing unnecessary comparisons, but with different methods. quicksort: balanced partition, no need to compare left subset to right subset. merge sort: simply compare the two minimum elements from two sub-arrays. heapsort: if A has larger value than B, A has larger value than B's descendants, and no need to compare with them.

like image 941
SuperBald Avatar asked Mar 21 '23 18:03

SuperBald


1 Answers

The main difference between the two is what direction they work: upwards (the O(n log n) algorithm) or downwards (the O(n)) algorithm.

In the O(n log n) algorithm done by making n insertions, each insertion might potentially bubble up an element from the bottom of the (current) heap all the way up to the top. So imagine that you've built all of the heap except the last full layer. Imagine that every time you do an insertion in that layer, the value you've inserted is the smallest overall value. In that case, you'd have to bubble the new element all the way up to the top of the heap. During this time, the heap has height (roughly) log n - 1, so the total number of swaps you'll have to do is (roughly) n log n / 2 - n / 2, giving a runtime of Θ(n log n) in the worst-case.

In the O(n) algorithm done by building the heap in one pass, new elements are inserted at the tops of various smaller heaps and then bubbled down. Intuitively, there are progressively fewer and fewer elements higher and higher up in the heap, so most of the work is spent on the leaves, which are lower down, than in the higher elements.

The major difference in the runtimes has to do with the direction. In the O(n log n) version, since elements are bubbled up, the runtime is bounded by the sum of the lengths of the paths from each node to the root of the tree, which is Θ(n log n). In the O(n) version, the runtime is bounded by the lengths of the paths from each node to the leaves of the tree, which is much lower (O(n)), hence the better runtime.

Hope this helps!

like image 53
templatetypedef Avatar answered Apr 28 '23 18:04

templatetypedef