Hilbert sort by divide and conquer algorithm?

Tags:

I'm trying to sort d-dimensional data vectors by their Hilbert order, for bulk-loading a spatial index.

However, I do not want to compute the Hilbert value for each point explicitly, which in particular requires setting a particular precision. In high-dimensional data, this involves a precision such as 32*d bits, which becomes quite messy to do efficiently. When the data is distributed unevenly, some of these calculations are unnecessary, and extra precision for parts of the data set are necessary.

Instead, I'm trying to do a partitioning approach. When you look at the 2D first order hilbert curve

1   4 |   | 2---3

I'd split the data along the x-axis first, so that the first part (not necessarily containing half of the objects!) will consist of 1 and 2 (not yet sorted) and the second part will have objects from 3 and 4 only. Next, I'd split each half again, on the Y axis, but reverse the order in 3-4.

So essentially, I want to perform a divide-and-conquer strategy (closely related to QuickSort - on evenly distributed data this should even be optimal!), and only compute the necessary "bits" of the hilbert index as needed. So assuming there is a single object in "1", then there is no need to compute the full representation of it; and if the objects are evenly distributed, partition sizes will drop quickly.

I do know the usual textbook approach of converting to long, gray-coding, dimension interleaving. This is not what I'm looking for (there are plenty of examples of this available). I explicitly want a lazy divide-and-conquer sorting only. Plus, I need more than 2D.

Does anyone know of an article or hilbert-sorting algorithm that works this way? Or a key idea how to get the "rotations" right, which representation to choose for this? In particular in higher dimensionalities... in 2D it is trivial; 1 is rotated +y, +x, while 4 is -y,-x (rotated and flipped). But in higher dimensionalities this gets more tricky, I guess.

(The result should of course be the same as when sorting the objects by their hilbert order with a sufficiently large precision right away; I'm just trying to save the time computing the full representation when not needed, and having to manage it. Many people keep a hashmap "object to hilbert number" that is rather expensive.)

Similar approaches should be possible for Peano curves and Z-curve, and probably a bit easier to implement... I should probably try these first (Z-curve is already working - it indeed boils down to something closely resembling a QuickSort, using the appropriate mean/grid value as virtual pivot and cycling through dimensions for each iteration).

Edit: see below for how I solved it for Z and peano curves. It is also working for 2D Hilbert curves already. But I do not have the rotations and inversion right yet for Hilbert curves.

876

asked Dec 10 '11 20:12

Has QUIT--Anony-Mousse

2 Answers

Use radix sort. Split each 1-dimensional index to d .. 32 parts, each of size 1 .. 32/d bits. Then (from high-order bits to low-order bits) for each index piece compute its Hilbert value and shuffle objects to proper bins.

This should work well with both evenly and unevenly distributed data, both Hilbert ordering or Z-order. And no multi-precision calculations needed.

One detail about converting index pieces to Hilbert order:

first extract necessary bits,
then interleave bits from all dimensions,
then convert 1-dimensional indexes to inverse Gray code.

If the indexes are stored in doubles:

If indexes may be negative, add some value to make everything positive and thus simplify the task.
Determine the smallest integer power of 2, which is greater than all the indexes and divide all indexes to this value
Multiply the index to 2^(necessary number of bits for current sorting step). Truncate the result, convert it to integer, and use it for Hilbert ordering (interleave and compute the inverse Gray code)
Subtract the result, truncated on previous step, from the index: index = index - i

Coming to your variant of radix sort, i'd suggest to extend zsort (to make hilbertsort out of zsort) with two binary arrays of size d (one used mostly as a stack, other is used to invert index bits) and the rotation value (used to rearrange dimensions).

If top value in the stack is 1, change pivotize(... ascending) to pivotize(... descending), and then for the first part of the recursion, push this top value to the stack, for second one - push the inverse of this value. This stack should be restored after each recursion. It contains the "decision tree" of last d recursions of radix sort procedure (in inverse Gray code).

After d recursions this "decision tree" stack should be used to recalculate both the rotation value and the array of inversions. The exact way how to do it is non-trivial. It may be found in the following links: hilbert.c or hilbert.c.

145

answered Oct 10 '22 05:10

Evgeny Kluev

You can compute the hilbert curve from f(x)=y directly without using recursion or L-systems or divide and conquer. Basically it's a gray code or hamiltonian path traversal. You can find a good description at Nick's spatial index hilbert curve quadtree blog or from the book hacker's delight. Or take a look at monotonic n-ary gray code. I've written an implementation in php including a moore curve.

answered Oct 10 '22 04:10

Micromega

Related questions
                            
                                How to efficiently implement closures in LLVM IR?
                            
                                sliding one fragment over another in Android
                            
                                LSH Libraries in Java
                            
                                d3.js - evenly spaced bars on a time scale
                            
                                Memory Leaks in Android application
                            
                                Why java.util.concurrent.atomic.AtomicBoolean is internally implemented with int?
                            
                                Is there a YAML Database?
                            
                                Installing PlatformToolset v110 without installing Visual Studio 2012
                            
                                iOS - Core Data and Server Database Synchronization Best Practices [duplicate]
                            
                                How to compile code with visual studio using makefile
                            
                                org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms
                            
                                How to start Titan graph server and connect with gremlin?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With