Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hilbert sort by divide and conquer algorithm?

Tags:

I'm trying to sort d-dimensional data vectors by their Hilbert order, for bulk-loading a spatial index.

However, I do not want to compute the Hilbert value for each point explicitly, which in particular requires setting a particular precision. In high-dimensional data, this involves a precision such as 32*d bits, which becomes quite messy to do efficiently. When the data is distributed unevenly, some of these calculations are unnecessary, and extra precision for parts of the data set are necessary.

Instead, I'm trying to do a partitioning approach. When you look at the 2D first order hilbert curve

1   4 |   | 2---3 

I'd split the data along the x-axis first, so that the first part (not necessarily containing half of the objects!) will consist of 1 and 2 (not yet sorted) and the second part will have objects from 3 and 4 only. Next, I'd split each half again, on the Y axis, but reverse the order in 3-4.

So essentially, I want to perform a divide-and-conquer strategy (closely related to QuickSort - on evenly distributed data this should even be optimal!), and only compute the necessary "bits" of the hilbert index as needed. So assuming there is a single object in "1", then there is no need to compute the full representation of it; and if the objects are evenly distributed, partition sizes will drop quickly.

I do know the usual textbook approach of converting to long, gray-coding, dimension interleaving. This is not what I'm looking for (there are plenty of examples of this available). I explicitly want a lazy divide-and-conquer sorting only. Plus, I need more than 2D.

Does anyone know of an article or hilbert-sorting algorithm that works this way? Or a key idea how to get the "rotations" right, which representation to choose for this? In particular in higher dimensionalities... in 2D it is trivial; 1 is rotated +y, +x, while 4 is -y,-x (rotated and flipped). But in higher dimensionalities this gets more tricky, I guess.

(The result should of course be the same as when sorting the objects by their hilbert order with a sufficiently large precision right away; I'm just trying to save the time computing the full representation when not needed, and having to manage it. Many people keep a hashmap "object to hilbert number" that is rather expensive.)

Similar approaches should be possible for Peano curves and Z-curve, and probably a bit easier to implement... I should probably try these first (Z-curve is already working - it indeed boils down to something closely resembling a QuickSort, using the appropriate mean/grid value as virtual pivot and cycling through dimensions for each iteration).

Edit: see below for how I solved it for Z and peano curves. It is also working for 2D Hilbert curves already. But I do not have the rotations and inversion right yet for Hilbert curves.

like image 876
Has QUIT--Anony-Mousse Avatar asked Dec 10 '11 20:12

Has QUIT--Anony-Mousse


People also ask

Which sorting algorithm is based on divide-and-conquer method?

Both merge sort and quicksort employ a common algorithmic paradigm based on recursion. This paradigm, divide-and-conquer, breaks a problem into subproblems that are similar to the original problem, recursively solves the subproblems, and finally combines the solutions to the subproblems to solve the original problem.

Which of the following sorting algorithm do not follow divide-and-conquer strategy?

Explanation: An additional space of O(n) is required in order to merge two sorted arrays. Thus merge sort is not an in place sorting algorithm.

Why is heapsort not a divide and conquer algorithm?

Heap sort has the time complexity of a 'divide and conquer' algorithm (such as quick sort), but it does not behave like a divide and conquer algorithm. Because it splits the data into a 'sorted' section and an 'unsorted' section, it is really a kind of selection sort.

Is bubble sort divide-and-conquer?

Bubble sort may also be viewed as a k = 2 divide- and-conquer sorting method. Insertion sort, selection sort and bubble sort divide a large instance into one smaller instance of size n - 1 and another one of size 1.


2 Answers

Use radix sort. Split each 1-dimensional index to d .. 32 parts, each of size 1 .. 32/d bits. Then (from high-order bits to low-order bits) for each index piece compute its Hilbert value and shuffle objects to proper bins.

This should work well with both evenly and unevenly distributed data, both Hilbert ordering or Z-order. And no multi-precision calculations needed.

One detail about converting index pieces to Hilbert order:

  • first extract necessary bits,
  • then interleave bits from all dimensions,
  • then convert 1-dimensional indexes to inverse Gray code.

If the indexes are stored in doubles:

  • If indexes may be negative, add some value to make everything positive and thus simplify the task.
  • Determine the smallest integer power of 2, which is greater than all the indexes and divide all indexes to this value
  • Multiply the index to 2^(necessary number of bits for current sorting step). Truncate the result, convert it to integer, and use it for Hilbert ordering (interleave and compute the inverse Gray code)
  • Subtract the result, truncated on previous step, from the index: index = index - i

Coming to your variant of radix sort, i'd suggest to extend zsort (to make hilbertsort out of zsort) with two binary arrays of size d (one used mostly as a stack, other is used to invert index bits) and the rotation value (used to rearrange dimensions).

If top value in the stack is 1, change pivotize(... ascending) to pivotize(... descending), and then for the first part of the recursion, push this top value to the stack, for second one - push the inverse of this value. This stack should be restored after each recursion. It contains the "decision tree" of last d recursions of radix sort procedure (in inverse Gray code).

After d recursions this "decision tree" stack should be used to recalculate both the rotation value and the array of inversions. The exact way how to do it is non-trivial. It may be found in the following links: hilbert.c or hilbert.c.

like image 145
Evgeny Kluev Avatar answered Oct 10 '22 05:10

Evgeny Kluev


You can compute the hilbert curve from f(x)=y directly without using recursion or L-systems or divide and conquer. Basically it's a gray code or hamiltonian path traversal. You can find a good description at Nick's spatial index hilbert curve quadtree blog or from the book hacker's delight. Or take a look at monotonic n-ary gray code. I've written an implementation in php including a moore curve.

like image 20
Micromega Avatar answered Oct 10 '22 04:10

Micromega