def heapify(A):
for root in xrange(len(A)//2-1, -1, -1):
rootVal = A[root]
child = 2*root+1
while child < len(A):
if child+1 < len(A) and A[child] > A[child+1]:
child += 1
if rootVal <= A[child]:
break
A[child], A[(child-1)//2] = A[(child-1)//2], A[child]
child = child *2 + 1
This is a similar implementation of python heapq.heapify(). It is said in the doc this function runs in O(n). But it looks like for n/2 elements, it does log(n) operations. Why is it O(n)?
The basic idea behind why the time is linear is due to the fact that the time complexity of heapify depends on where it is within the heap. It takes O ( 1 ) O(1) O(1) time when the node is a leaf node (which makes up at least half of the nodes) and O ( log n ) O(\log n) O(logn) time when it's at the root.
Understanding the linear time complexity of heapify. It is essentially a balanced binary tree with the property that the value of each parent node is less than or equal to any of its children for the MinHeap implementation and greater than or equal to any of its children for the MaxHeap implementation.
heapify − This function converts a regular list to a heap. In the resulting heap the smallest element gets pushed to the index position 0. But rest of the data elements are not necessarily sorted. heappush − This function adds an element to the heap without altering the current heap.
The heapsort algorithm itself has O(n log n) time complexity using either version of heapify.
It requires more careful analysis, such as you'll find here. The basic insight is that only the root of the heap actually has depth log2(len(a))
. Down at the nodes one above a leaf - where half the nodes live - a leaf is hit on the first inner-loop iteration.
Waving hands some, when the algorithm is looking at a node at the root of a subtree with N
elements, there are about N/2
elements in each subtree, and then it takes work proportional to log(N)
to merge the root and those sub-heaps into a single heap. So the total time T(N)
required is about
T(N) = 2*T(N/2) + O(log(N))
That's an uncommon recurrence. The Akra–Bazzi method can be used to deduce that it's O(N)
, though.
I think more informative, and certainly more satifsying, is to derive an exact solution from scratch. Toward that end, I'll only talk about complete binary trees: as full as possible on every level. Then there 2**N - 1
elements in total, and all subtrees are also complete binary trees. This sidesteps mounds of pointless details about how to proceed when things aren't exactly balanced.
When we're looking at a subtree with 2**k - 1
elements, its two subtrees have exactly 2**(k-1) - 1
elements each, and there are k
levels. For example, for a tree with 7 elements, there's 1 element at the root, 2 elements on the second level, and 4 on the third. After the subtrees are heapified, the root has to moved into place, moving it down 0, 1, or 2 levels. This requires doing comparisons between levels 0 and 1, and possibly also between levels 1 and 2 (if the root needs to move down), but no more that that: the work required is proportional to k-1
. In all, then,
T(2**k - 1) = 2 * T(2**(k-1) - 1) + (k - 1)*C
for some constant C
bounding the worst case for comparing elements at a pair of adjacent levels.
What about T(1)
? That's free! A tree with only 1 element is a already a heap - there's nothing to do.
T(1) = 0
One level above those leaves, trees have 3 elements. It costs (no more than) C
to move the smallest (for a min-heap; largest for a max-heap) to the top.
T(3) = C
One level above that trees have 7 elements. It costs T(3)
to heapify each of the subtrees, and then no more than 2*C
to move the root into place:
T(7) = 2*C + 2*C = 4*C
Continuing in the same way:
T(15) = 2* 4*C + 3*C = 11*C
T(31) = 2*11*C + 4*C = 26*C
T(63) = 2*26*C + 5*C = 57*C
...
T(2**k - 1) = (2**k - k - 1)*C
where the last line is a guess at the general form. You can verify that "it works" for all the specific lines before it, and then it's straightforward to prove it by induction.
So, where N = 2**k - 1
,
T(N) = (N - log2(N+1)) * C
which shows that T(N)
is bounded above by C*N
, so is certainly O(N)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With