Binary heaps are commonly used in e.g. priority queues. The basic idea is that of an incomplete heap sort: you keep the data sorted "just enough" to get out the top element quickly.
While 4-ary heaps are theoretically worse than binary heaps, they do also have some benefits. For example, they will require less heap restructuring operations (as the heap is much shallower), while obvisouly needing more comparisons at each level. But (and that probably is their main benefit?) they may have better CPU cache locality. So some sources say that 3-ary and 4-ary heaps outperform both Fibonacci and binary heaps in practise.
They should not be much harder to implement, the additional cases are just some extra if
cases.
Has anyone experimented with 4-ary heaps (and 3-ary) for priority queues and done some benchmarking? In Java you never know if they are faster or slower before you benchmarked them extensively. And from all I've found via Google, it may be quite language and use case dependant. Some sources say that they found 3-ary to perform best for them.
Some more points:
PriorityQueue
obviously is a binary heap. But the class for example also lacks bulk loading and bulk repair support, or replaceTopElement
which can make a huge difference. Bulk loading for example is O(n)
instead of O(n log n)
; bulk repair is essentially the same after adding a larger set of candidates. Tracking which parts of the heap are invalid can be done with a single integer. replaceTopElement
is much cheaper than poll
+ add
(just consider how a poll is implemented: replace the top element with the very last)As per @ErichSchubert's suggestion, I have taken the implementations from ELKI and modified them into a 4-ary heap. It was a bit trick to get the indexing right, as a lot of the publications around 4-ary heaps use formulas for 1-indexed arrays?!?
Here are some early benchmark results, based on the ELKI unit test. 200000 Double
objects are preallocated (to avoid measuring memory management too much) and shuffled.
As a warmup, 10 iterations are performed for each heap, for benchmarking 100 iterations, but I'll probably try to scale this up further. 10-30 seconds isn't that realiably for benchmarking yet, and OTOH I should try to measure standard deviations, too. In each iteration, the 200000 elements are added to the heap, then half of them are polled again. Yes, the workload could also be made more complex.
Here are the results:
DoubleMinHeap
: 10.371DoubleMinHeap
: 12.356Heap<Double>
: 37.458PriorityQueue<Double>
: 45.875So the difference between the 4-ary heap (probably not yet L1 cache-aligned!) and the ELKI heap for primitive doubles is not too big. Well, 10%-20% or so; it could be worse.
The difference between a heap for primitive double
s and a heap for Double
objects is much larger. And the ELKI Heap
is indeed quite clearly faster than the Java PriorityQueue
(but that one seems to have a high variance).
There was a slight "bug" in ELKI, though - at least the primitive heaps did not use the bulk loading code yet. It's there, it's just not being used, as every elements repairs the heap immediately instead of delaying this until the next poll()
. I fixed this for my experiments, essentially by removing a few lines and adding one ensureValid();
call. Furthermore, I also don't have a 4-ary object heap yet, and I havn't include ELKI's DoubleObjectMinHeap
yet... quite a lot to benchmark, and I'll probably give caliper a try for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With