Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

4-ary heaps in Java

Binary heaps are commonly used in e.g. priority queues. The basic idea is that of an incomplete heap sort: you keep the data sorted "just enough" to get out the top element quickly.

While 4-ary heaps are theoretically worse than binary heaps, they do also have some benefits. For example, they will require less heap restructuring operations (as the heap is much shallower), while obvisouly needing more comparisons at each level. But (and that probably is their main benefit?) they may have better CPU cache locality. So some sources say that 3-ary and 4-ary heaps outperform both Fibonacci and binary heaps in practise. They should not be much harder to implement, the additional cases are just some extra if cases.

Has anyone experimented with 4-ary heaps (and 3-ary) for priority queues and done some benchmarking? In Java you never know if they are faster or slower before you benchmarked them extensively. And from all I've found via Google, it may be quite language and use case dependant. Some sources say that they found 3-ary to perform best for them.

Some more points:

  • PriorityQueue obviously is a binary heap. But the class for example also lacks bulk loading and bulk repair support, or replaceTopElement which can make a huge difference. Bulk loading for example is O(n) instead of O(n log n); bulk repair is essentially the same after adding a larger set of candidates. Tracking which parts of the heap are invalid can be done with a single integer. replaceTopElement is much cheaper than poll + add (just consider how a poll is implemented: replace the top element with the very last)
  • While heaps of course are popular for complex objects, the priority often is an integer of double value. It's not as if we are comparing strings here. Usually it is a (primitive) priority
  • PQs are often used just to get the top k elements. For example A*-search can terminate when the goal is reached. All the less good paths are then discarded. So the queue is never completely emptied. In a 4-way heap, there is less order: approximately half as much (half as many parent nodes). So it will impose less order on these elements that are not needed. (This of course differs if you intend to empty your heap completely, e.g. because you are doing heap sort.)
like image 518
Has QUIT--Anony-Mousse Avatar asked Dec 24 '12 00:12

Has QUIT--Anony-Mousse


1 Answers

As per @ErichSchubert's suggestion, I have taken the implementations from ELKI and modified them into a 4-ary heap. It was a bit trick to get the indexing right, as a lot of the publications around 4-ary heaps use formulas for 1-indexed arrays?!?

Here are some early benchmark results, based on the ELKI unit test. 200000 Double objects are preallocated (to avoid measuring memory management too much) and shuffled.

As a warmup, 10 iterations are performed for each heap, for benchmarking 100 iterations, but I'll probably try to scale this up further. 10-30 seconds isn't that realiably for benchmarking yet, and OTOH I should try to measure standard deviations, too. In each iteration, the 200000 elements are added to the heap, then half of them are polled again. Yes, the workload could also be made more complex.

Here are the results:

  • My 4-ary DoubleMinHeap: 10.371
  • ELKI DoubleMinHeap: 12.356
  • ELKI Heap<Double>: 37.458
  • Java PriorityQueue<Double>: 45.875

So the difference between the 4-ary heap (probably not yet L1 cache-aligned!) and the ELKI heap for primitive doubles is not too big. Well, 10%-20% or so; it could be worse.

The difference between a heap for primitive doubles and a heap for Double objects is much larger. And the ELKI Heap is indeed quite clearly faster than the Java PriorityQueue (but that one seems to have a high variance). There was a slight "bug" in ELKI, though - at least the primitive heaps did not use the bulk loading code yet. It's there, it's just not being used, as every elements repairs the heap immediately instead of delaying this until the next poll(). I fixed this for my experiments, essentially by removing a few lines and adding one ensureValid(); call. Furthermore, I also don't have a 4-ary object heap yet, and I havn't include ELKI's DoubleObjectMinHeap yet... quite a lot to benchmark, and I'll probably give caliper a try for that.

like image 170
Has QUIT--Anony-Mousse Avatar answered Sep 23 '22 23:09

Has QUIT--Anony-Mousse