Python heapify() time complexity

Tags:

def heapify(A):
    for root in xrange(len(A)//2-1, -1, -1):
        rootVal = A[root]
        child = 2*root+1
        while child < len(A):
            if child+1 < len(A) and A[child] > A[child+1]:
                child += 1
            if rootVal <= A[child]:
                break
            A[child], A[(child-1)//2] = A[(child-1)//2], A[child]
            child = child *2 + 1

This is a similar implementation of python heapq.heapify(). It is said in the doc this function runs in O(n). But it looks like for n/2 elements, it does log(n) operations. Why is it O(n)?

732

asked Aug 07 '18 21:08

typing...

1 Answers

It requires more careful analysis, such as you'll find here. The basic insight is that only the root of the heap actually has depth log2(len(a)). Down at the nodes one above a leaf - where half the nodes live - a leaf is hit on the first inner-loop iteration.

"Exact" derivation

Waving hands some, when the algorithm is looking at a node at the root of a subtree with N elements, there are about N/2 elements in each subtree, and then it takes work proportional to log(N) to merge the root and those sub-heaps into a single heap. So the total time T(N) required is about

T(N) = 2*T(N/2) + O(log(N))

That's an uncommon recurrence. The Akra–Bazzi method can be used to deduce that it's O(N), though.

I think more informative, and certainly more satifsying, is to derive an exact solution from scratch. Toward that end, I'll only talk about complete binary trees: as full as possible on every level. Then there 2**N - 1 elements in total, and all subtrees are also complete binary trees. This sidesteps mounds of pointless details about how to proceed when things aren't exactly balanced.

When we're looking at a subtree with 2**k - 1 elements, its two subtrees have exactly 2**(k-1) - 1 elements each, and there are k levels. For example, for a tree with 7 elements, there's 1 element at the root, 2 elements on the second level, and 4 on the third. After the subtrees are heapified, the root has to moved into place, moving it down 0, 1, or 2 levels. This requires doing comparisons between levels 0 and 1, and possibly also between levels 1 and 2 (if the root needs to move down), but no more that that: the work required is proportional to k-1. In all, then,

T(2**k - 1) = 2 * T(2**(k-1) - 1) + (k - 1)*C

for some constant C bounding the worst case for comparing elements at a pair of adjacent levels.

What about T(1)? That's free! A tree with only 1 element is a already a heap - there's nothing to do.

T(1) = 0

One level above those leaves, trees have 3 elements. It costs (no more than) C to move the smallest (for a min-heap; largest for a max-heap) to the top.

T(3) = C

One level above that trees have 7 elements. It costs T(3) to heapify each of the subtrees, and then no more than 2*C to move the root into place:

T(7) = 2*C + 2*C = 4*C

Continuing in the same way:

T(15) = 2* 4*C + 3*C = 11*C
T(31) = 2*11*C + 4*C = 26*C
T(63) = 2*26*C + 5*C = 57*C
...
T(2**k - 1) = (2**k - k - 1)*C

where the last line is a guess at the general form. You can verify that "it works" for all the specific lines before it, and then it's straightforward to prove it by induction.

So, where N = 2**k - 1,

T(N) = (N - log2(N+1)) * C

which shows that T(N) is bounded above by C*N, so is certainly O(N).

answered Oct 05 '22 23:10

Tim Peters

Related questions
                            
                                Using git to manage virtualenv state: will this cause problems?
                            
                                python multiprocessing arguments: deep copy?
                            
                                `DummyExecutor` for Python's `futures`
                            
                                How to use SQLAlchemy to seamlessly access multiple databases?
                            
                                Making pyplot.hist() first and last bins include outliers
                            
                                Django: how to set log level to INFO or DEBUG
                            
                                Why I am suddenly seeing `Usage: source deactivate` whenever I run virtualenvwrapper commands?
                            
                                How can I restrict the scope of a multiprocessing process?
                            
                                Python multiprocessing within mpi
                            
                                "No such file or directory" from os.mkdir
                            
                                What's the best way to refresh TensorBoard after new events/logs were added?
                            
                                python equality precedence
                            
                                Psycopg2 Python SSL Support is not compiled in
                            
                                Concatenate (join) a NumPy array with a pandas DataFrame
                            
                                Multiple columns with the same name in Pandas
                            
                                Pandas DataFrame with tuple of strings as index
                            
                                python sqlite3 OperationalError: attempt to write a readonly database
                            
                                Python: Stacktrace vs Traceback
                            
                                Django admin add custom filter
                            
                                Stop a python script without losing data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python heapify() time complexity

Tags:

python

python-3.x

python-2.7

heap

heapq

typing...

People also ask

1 Answers

"Exact" derivation

Tim Peters

Recent Activity

Donate For Us