Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary heaps vs d-ary heaps

I've read that binary heaps are faster at delete minimum operations and d-ary heaps are faster at at decrease priority operations (although I don't get why), but then I've also read that a 4-heap is faster at both of them compared to a binary heap.

So when do I use a binary heap and when do I use a d-ary heap? And how do I decide what the d should be for the d-ary heap?

like image 350
Just some guy Avatar asked Mar 18 '15 15:03

Just some guy


People also ask

What are binary heaps good for?

Heaps are used when the highest or lowest order/priority element needs to be removed. They allow quick access to this item in O(1) time. One use of a heap is to implement a priority queue. Binary heaps are usually implemented using arrays, which save overhead cost of storing pointers to child nodes.

What are the two types of binary heap?

The ordering can be one of two types: the min-heap property: the value of each node is greater than or equal to the value of its parent, with the minimum-value element at the root. the max-heap property: the value of each node is less than or equal to the value of its parent, with the maximum-value element at the root.

What is the height of a d-ary heap?

Notice taht the binary heap procedures are a special case of the above procedures when d = 2. b. Since each node has d children, the height of a d-ary heap with n nodes is Θ(logd n) = Θ(lg d/ lg n).

What is 3 ary max heap?

A 3-ary max heap is like a binary max heap, but instead of 2 children, nodes have 3 children. A 3-ary heap can be represented by an array as follows: The root is stored in the first location, a[0], nodes in the next level, from left to right, is stored from a[1] to a[3].


1 Answers

There are a few different factors at play here that, I believe, make it possible for all of the statements you've made to be true.

To see why this is, let's start by thinking about how a decrease-key in a d-ary heap works (we don't need to talk about binary heaps separately, since a binary heap is just a 2-ary heap). When performing a decrease-key, we change the priority of a node in the tree, then repeatedly swap it up with its parent until it either hits the root of the tree or its priority ends up becoming smaller than the priority of its parent. The number of times we have to do a swap is, in the worst case, given by the height of the d-ary heap. Since the number of nodes in each layer of a d-ary heap grows exponentially by a factor of d at each step, the height of a d-ary heap is O(logd n) = O(log n / log d). This means that if you increase the value of d, the height of the d-ary heap will decrease, so decrease-keys and insertions will take less time. If you think about an extreme case, if you have a 10100-ary heap, the number of layers in the tree will be about 100 times fewer than in a binary heap, so a decrease-key or insert will be about 100 times faster.

On the other hand, think about how a dequeue operation would work. To perform a dequeue, we swap the last leaf for the root, then repeatedly do the following: we scan across all the children of the current node, and if any of them are smaller than the current node, we swap the current node with the smallest of its children. Each of these iterations will require O(d) total comparisons to find the smallest child, and the number of iterations is given by the number of layers in the tree, which we've seen earlier is O(log n / log d). This means that the cost of a dequeue in a d-ary heap is O(d log n / log d). Since d grows much faster than log d (exponentially faster, in fact), as we increase d, the asymptotic - and actual - cost of a dequeue starts to rise. For example, in a 10100-ary heap, you might have to compare each node to 10100 children at each step, which is going to take a really long time! Therefore, d-ary heaps, as d gets larger and larger, tend to have much slower dequeues than binary heaps.

Now on to your last question: how is it still possible that a 4-ary heap would outperform a binary heap given the information here? I'm going to be perfectly honest and say that I have no idea if this is true, but that it (a) probably depends on the hardware and (b) wouldn't surprise me. Keep in mind that all of the preceding analyses tried to bound the cost of the d-ary heap operations by looking at quantities like the number of layers in the heap and the number of swaps made. However, this leaves out a lot of other factors, such as the cost of finding parents and children and locality of reference. For the first of these, note that in a d-ary heap, you can find your parent node by dividing your index by d. For d's that are perfect powers of two, this can be implemented with a simple, inexpensive bitshift (since n / 2k = n >> k). For odd numbers or numbers that aren't powers of two, this requires a division, which (in some architectures) is more expensive than a bit shift. Additionally, there's the impact of locality of reference. Computers these days have a huge number of layers of caches in memory, and the cost of accessing memory that is in cache can be hundreds or thousands of times faster than the cost of accessing memory not in the cache. As you increase the value of d in a d-ary heap, there are fewer layers in the tree and the elements accessed are closer together, giving better locality. Finding the sweet spot probably requires some experimentation, and if it happens to be that d = 4 is the best on your machine, then go for it!

EDIT: as @moreON pointed out, for d = 4, the number of layers in the heap goes down by a factor of two and the number of comparisons per later goes up by a factor of two, which may actually give better overall performance due to cache effects and a lower overall tree height. Therefore, it's probably a good candidate to outperform a binary heap.

Hope this helps!

like image 71
templatetypedef Avatar answered Oct 17 '22 04:10

templatetypedef