Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

worst case in MAX-HEAPIFY: "the worst case occurs when the bottom level of the tree is exactly half full"

In CLRS, third Edition, on page 155, it is given that in MAX-HEAPIFY,

"the worst case occurs when the bottom level of the tree is exactly half full"  

I guess the reason is that in this case, Max-Heapify has to "float down" through the left subtree.
But the thing I couldn't get is "why half full" ?
Max-Heapify can also float down if left subtree has only one leaf. So why not consider this as the worst case ?

like image 766
Happy Mittal Avatar asked Jul 28 '11 13:07

Happy Mittal


People also ask

What is its worst case time complexity of Max Heapify?

=> The worst-case complexity of BUILD-MAX-HEAP is Theta(n). Part D: Heapsort --------------- We can use max-heaps to sort an array A. Therefore heapsort takes O(n log n) time in the worst-case.

What will be the sum of max-heap leaf elements?

The max heap is a complete binary tree, therefore, the height and count of the number of leaf nodes are fixed. In the max heap, the value of the node is always greater than or equal to the children of that node. The maximum leaf node value is always greater than or equal to the number of leaves in the tree.

What is the time complexity for adding a new element to a heap that contains N elements and has a height h?

Hence the height of heap is h = O(logn) . So the insertion time of an element in the heap is equivalent to the height of the tree ie. O(h) = O(logn) . For n elements this will take O(nlogn) time.

What is the running time of heapsort on an array A of length N that is already sorted in an increasing order?

Solution: The running time of HEAPSORT on an array A of length n that is already sorted in increasing order is Г(nlgn) because even though it is already sorted, it will be transformed back into a heap and sorted. max element is removed and the HEAPIFY is called it will cover the full height of the tree.


2 Answers

Read the entire context:

The children's subtrees each have size at most 2n/3 - the worst case occurs when the last row of the tree is exactly half full

Since the running time T(n) is analysed by the number of elements in the tree (n), and the recursion steps into one of the subtrees, we need to find an upper bound on the number of nodes in a subtree, relative to n, and that will yield that T(n) = T(max num. nodes in subtree) + O(1)

The worst case of number of nodes in a subtree is when the final row is as full as possible on one side, and as empty as possible on the other. This is called half full. And the left subtree size will be bounded by 2n/3.

If you're proposing a case with only a few nodes, then that's irrelevant, since all base cases can be considered O(1) and ignored.

like image 166
davin Avatar answered Sep 28 '22 13:09

davin


I know there's already an accepted answer, but for those who've got the same question and are still a little bit confused (as I was), or sth is unclear -- here's a little bit longer and detailed explanation.

Though it might sound boring or redundant, we've to be very clear about the exact definitions because through attention to the details -- chances are that when you do that, proving things becomes much easier.

From CLRS section 6.1, The (binary) heap data structure is an array object that we can view as a nearly complete binary tree

From Wikipedia, In a complete binary tree, every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.

Also, from Wikipedia, A balanced binary tree is a binary tree structure in which the left and right sub-trees of every node differ in height by no more than 1.

So, in comparison to root, the height of the left and right sub-tree can differ by 1 at max.

Now, Consider a tree T, and let the height of the left sub-tree = h+1 and the height of the right sub-tree = h

What's the worst-case in MAX_HEAPIFY? The worst-case is when we end up doing more comparisons and swaps while trying to maintain the heap property.

If the MAX_HEAPIFY algorithm runs and it recursively goes through the longest path, then we can consider a possible worst-case.

Well, all the longest paths are in the left sub-tree (as its height is h+1). Why not the right sub-tree? Remember the definition, all the nodes in the last level have to be as far left as possible.

So, to get more number of the longest paths we oughta make the left sub-tree FULL (Why? So that we can get more paths to choose from and opt for the path that gives the worst-case time). Since the left subtree is of height h+1, it will have 2^(h+1) leaf nodes and therefore 2^(h+1) longest paths from the root. This is the maximum possible number of longest paths in tree T (of h+1 height).

Here's the image of the tree structure in the worst-case situation.

From the above image, consider that the yellow(left) and pink(right) sub-trees have x nodes each. The pink portion is a complete right sub-tree and the yellow portion is the left sub-tree excluding the last level.

Notice that both the yellow(left) and the pink(right) sub-trees have height h.

Now, since the start, we've considered the left-subtree to be of height h+1 as a whole (including the yellow portion and the last level), if I may ask, how many nodes do we've to add in the last level i.e. below the yellow portion to make the left sub-tree completely full?

Well, the bottom-most layer of the yellow portion has ⌈x/2⌉ nodes (Total number of leaves in a tree/subtree having n nodes = ⌈n/2⌉; for a proof visit this link), and now if we add 2 children to each of these nodes/leaves, => total x (≈x) nodes have been added (How? ⌈x/2⌉ leaves * 2 ≈ x nodes).

With this addition, we make the left sub-tree of height h+1 (the yellow portion with height h + this one last level added) and FULL, hence meeting the worst-case criteria.

Since the left sub-tree is FULL, the whole Tree is half-full.

Now, the most important question -- why don't we add more nodes or add nodes in the right sub-tree? Well, that's because now if we tend to add more nodes, the nodes will have to be added in the right sub-tree (as the left sub-tree is FULL), which, in turn, will tend to balance out the tree more. Now as the tree is starting to get more balanced, we're tending to move towards the best-case scenario and not the worst-case.

Also, how many nodes do we have in total?

Total nodes of the tree n = x (from the yellow portion) + x (from the pink portion) + x (addition of the last level below the yellow portion) = 3x

Notice, as a by-product, that the left sub-tree in total contains at-most 2x nodes i.e. 2n/3 nodes (x = n/3).

like image 32
Aarush Aggarwal Avatar answered Sep 28 '22 11:09

Aarush Aggarwal