Compression of a 2D point set - ideas?

Question

I have a set of 2D points stored in an array. I need to compress it as much as I can. Preferably fast, but not a deal breaker, compression rate is the goal. The rules are:

a point = a 32-bit structure, stored as (x,y), 2 bytes for each coordinate
a coordinate = a "float" with 8 bits integer part, 8 bits fractional part

Special properties:

I may change the order of the points as I see fit
I'm given the points in the order of the integer parts of their x and y, maybe I can exploit that, but the fractional parts are pretty much random from what I saw
the array I receive is contiguous (from a memory standpoint)

What I've researched so far:

consider them plain integers (32 bit), sort them (the order is mine to choose), then compress it as in this question.
consider my array as a plain char string, then apply a Burrows-Wheeler transform (BWT) with run-length encoding, or Huffman
treat my array as plain binary data, then apply an LZW

I only managed to implement Huffman and BWT, but neither of them gives me a good compression ratio (or use the main property of my data set). I'm going to try the first option today.

I'm sure there are better ideas. Do you have any? Did you come across anything similar and implemented something really good?

Dataset example, in hex:

00 0A 00 77 00 55 00 80 00 2B 00 B9 00 7A 00 5B 
00 F6 00 76 00 B4 00 25 00 47 00 D3 00 F6 00 7D
...
01 05 00 A9 01 B8 00 10 01 4F 00 6A 01 E6 00 DF
01 1F 00 F0 01 BE 00 C3 01 6C 00 87 01 CE 00 44
...
...
15 06 03 F4 15 1E 03 29 15 35 03 10 15 B9 03 22
15 67 03 73 15 EF 03 5C 15 29 03 B8 15 4C 03 2F
...

where e.g. the particle 15 67 03 73 (last row) means particle at x = 15 and 67/256, y = 3 and 73/256. As you see, the data is somewhat ordered but the fractional parts are in a total disarray.

Evgeny Kluev · Accepted Answer

First option from OP is more appropriate. But it may be improved further.

Reinterpret coordinates as 16-bit integers.
Transform point positions into distances along Hilbert curve (or any other Space-filling curve).
Sort distances, then apply delta-encoding (compute differences of adjacent distances).
Depending on compression/speed preferences, (a) use something like Elias or Golomb codes (fastest), (b) use Huffman encoding, or (c) use something like arithmetic encoding (best compression rate).

If there is some pattern in points' distribution, you could try more advanced compressors for step #4: LZ*, BWT, or PPM.

Here are results of experimental comparison for methods used in step 4. Worst-case scenario is assumed: points are randomly uniformly distributed in range 00.00 .. FF.FF (so that the only compression possibility is to lose information about their ordering). All results are computed for 250000 points:

method        compressed size
------        ---------------
Uncompressed: 1000000
Elias4:        522989
Elias3:        495371
Elias2:        505376
Golomb12:      479802
Golomb13:      472238
Golomb14:      479431
Golomb15:      501422
FSE1:          455367
FSE2:          454120
FSE3:          453862

I didn't try Huffman encoding. FSE is a method similar to arithmetic coding. Numbers after method name show configuration parameters: for Elias coding - how many bits are used to encode each number's bitlength, for Golomb coding - how many least significant bits are left uncompressed, for FSE - how many most significant bits are compressed (along with bitlength).

All the results were produced by this source: http://ideone.com/ulmPzO

salva · Answer

Interleave the bits representing the X and Y coordinates of every point, sort and compress.

For instance, if you have the point (X, Y) represented by the two 16 bit numbers

(X₁₅X₁₄X₁₃X₁₂X₁₁X₁₀X₉X₈X₇X₆X₅X₄X₃X₂X₁X₀, Y₁₅Y₁₄Y₁₃Y₁₂Y₁₁Y₁₀Y₉Y₈Y₇Y₆Y₅Y₄Y₃Y₂Y₁Y₀)

Convert it into the following 32 bit number:

X₁₅Y₁₅X₁₄Y₁₄X₁₃Y₁₃X₁₂Y₁₂X₁₁Y₁₁X₁₀Y₁₀X₉Y₉X₈Y₈X₇Y₇X₆Y₆X₅Y₅X₄Y₄X₃Y₃X₂Y₂X₁Y₁X₀Y₀

That would take advantage of any clustering that may appear in the data, as near physically near points will appear in near positions on the sorted list and their representations share their head bits.

Update: The point is to have near points sort in near positions. If you mix X and Y bits, you get that, resulting on long sequences of 32bit integers which identical values in its head bits. If you then do deltas, you will have smaller values that if you just sort on X and then on Y (or vice versa).

The thing is that then, you can consider it as a k-d tree, every bit partitions the space (left/right or up/down). For the first levels, you can compress then just saying how many elements there are at one side until you get to the poles with just a few elements that can be represented by stating the remaining few bits explicitly. For the best compression you will have to use arithmetic coding.

Compression of a 2D point set - ideas?

Tags:

c++

arrays

algorithm

compression

webuster

2 Answers

Evgeny Kluev

salva

Recent Activity

Donate For Us

Compression of a 2D point set - ideas?

Tags:

c++

arrays

algorithm

compression

webuster

2 Answers

Evgeny Kluev

salva

Related questions

Recent Activity

Donate For Us