I am learning about heaps right now in my algorithms class and can't understand how, in practice, a max heap is better than a linked list that just stores the max value at the head pointer.
What is the point of having a node have two children that are less than it, why can't it just have one like a linked list? Basically, what can a heap do that a linked list can't? I feel like I am missing something fundamental here.
Both are implementations of the Priority queue abstract data type. The difference is in performance--how long it takes to insert or remove an item.
You could implement a priority queue as a simple unordered linked list. Whenever you add an item, just make it the new head of the list. When somebody asks for the maximum item, you search the list to find the largest one, remove it from the list, and return it. So insertion is very fast, and getting the max item requires a full scan of the list every time.
You could also maintain a sorted list. In that case, inserting an item is expensive because you have to search the list sequentially to find where the new item is to be inserted. But getting the max item is very fast: it's at the head of the list.
Imagine, though, that you have a list with a million items. In the first case, finding the maximum item requires you to search the entire list. In the second case, inserting an item requires, on average, that you search half of the list.
A max heap, on the other hand, is something of a compromise. It's not as fast on insertion as the unordered list, and it's not as fast on removal as the sorted list. But it's much faster than the unordered list on removal, and much faster than the the sorted list on insertion.
Below I list the asymptotic complexity of the two basic operations for these three types of priority queues:
Insert Delete-Max
Unordered list O(1) O(n)
Sorted list O(n) O(1)
Binary max heap O(log n) O(log n)
Imagine again that your priority queue has 1,000,000 items in it. You want to add an item and then remove the maximum. The unordered list will require a single, fast operation to insert an item, and you'll have to search 1,000,000 items to find the max.
The sorted list will take, on average, 500,000 operations to insert an item, but you can remove the max item very quickly.
log2(1000000) is about 20. The max heap will take at most 20 operations to insert an item, and will take at most 20 operations to remove an item.
It should be clear that if you need to maintain a priority queue, a max heap is much more efficient than a linked list.
Now, if your data are already sorted and you just need to go through them in order, then of course you wouldn't need a heap. You wouldn't even need a linked list. An array would be the way to go. But when you have a mixture of additions and removals, then maintaining a heap will give you much better performance.
Even if you're maintaining a very small queue--say, 10 items--the performance difference between a linked list and a max heap is very noticeable. If your application's performance depends on the speed of your priority queue implementation, it's worth taking the time to understand the heap data structure.
For a more detailed version of the above, see my blog entries on Priority queues and Heaps, and the follow-up articles that discuss implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With