Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we sort via Heaps instead of Binary Search Trees?

A heap can be constructed from a list in O(n logn) time, because inserting an element into a heap takes O(logn) time and there are n elements.

Similarly, a binary search tree can be constructed from a list in O(n logn) time, because inserting an element into a BST takes on average logn time and there are n elements.

Traversing a heap from min-to-max takes O(n logn) time (because we have to pop n elements, and each pop requires an O(logn) sink operation). Traversing a BST from min-to-max takes O(n) time (literally just inorder traversal).

So, it appears to me that constructing both structures takes equal time, but BSTs are faster to iterate over. So, why do we use "Heapsort" instead of "BSTsort"?

Edit: Thank you to Tobias and lrlreon for your answers! In summary, below are the points why we use heaps instead of BSTs for sorting.

  • Construction of a heap can actually be done in O(n) time, not O(nlogn) time. This makes heap construction faster than BST construction.
  • Additionally, arrays can be easily transformed into heaps in-place, because heaps are always complete binary trees. BSTs can't be easily implemented as an array, since BSTs are not guaranteed to be complete binary trees. This means that BSTs require additional O(n) space allocation to sort, while Heaps require only O(1).
  • All operations on heaps are guaranteed to be O(logn) time. BSTs, unless balanced, may have O(n) operations. Heaps are dramatically simpler to implement than Balanced BSTs are.
  • If you need to modify a value after creating the heap, all you need to do is apply the sink or swim operations. Modifying a value in a BST is much more conceptually difficult.
like image 348
Shuklaswag Avatar asked Dec 25 '17 20:12

Shuklaswag


1 Answers

There are multiple reasons I can imagine you would want to prefer a (binary) heap over a search tree:

  • Construction: A binary heap can actually be constructed in O(n) time by applying the heapify operations bottom-up from the smallest to the largest subtrees.
  • Modification: All operations of the binary heap are rather straightforward:

    • Inserted an element at the end? Sift it up until the heap condition holds
    • Swapped the last element to the beginning? Swift it down until the heap condition holds
    • Changed the key of an entry? Sift it up or down depending on the direction of the change
  • Conceptual simplicity: Due to its implicit array representation, a binary heap can be implemented by anyone who knows the basic indexing scheme (2i+1, 2i+2 are the children of i) without considering many difficult special cases.
    If you look at these operations in a binary search tree, in theory they are also quite simple, but the tree has to be stored explicitly, e.g. using pointers, and most of the operations require the tree to be rebalanced to preserve the O(log n) height, which requires complicated rotations (red black-trees) or splitting/merging nodes (B-trees)

  • EDIT: Storage: As Irleon pointed out, to store a BST you also need more storage, as at least two child pointers need to be stored for every entry in addition to the value itself, which can be a large storage overhead especially for small value types. At the same time, the heap needs no additional pointers.

To answer your question about sorting: A BST takes O(n) time to traverse in-order, the construction process takes O(n log n) operations which, as mentioned before, are much more complex.

At the same time Heapsort can actually be implemented in-place by building a max-heap from the input array in O(n) time and and then repeatedly swapping the maximum element to tbe back and shrinking the heap. You can think of Heapsort as Insertion sort with a helpful data structure that lets you find the next maximum in O(log n) time.

like image 110
Tobias Ribizel Avatar answered Sep 28 '22 22:09

Tobias Ribizel