Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compression algorithm for sorted integers

I have a large sequence of random integers sorted from the lowest to the highest. The numbers start from 1 bit and end near 45 bits. In the beginning of the list I have numbers very close to each other: 4, 20, 23, 40, 66. But when the numbers start to get higher the distance between them is a bit higher too (actually the distance between them is aleatory). There are no duplicated numbers.

I'm using bit packing to save some space. Nonetheless, this file can get really big.

I would like to know what kind of compression algorithm can be used in this situation, or any other technique to save as much space as possible.

Thank you.

like image 496
Frederico Schardong Avatar asked Sep 30 '12 19:09

Frederico Schardong


1 Answers

One option that worked well for me in the past was to store a list of 64-bit integers as 8 different lists of 8-bit values. You store the high 8 bits of the numbers, then the next 8 bits, etc. For example, say you have the following 32-bit numbers:

0x12345678
0x12349785
0x13111111
0x13444444

The data stored would be (in hex):

12,12,13,13
34,34,11,44
56,97,11,44
78,85,11,44

I then ran that through the deflate compressor.

I don't recall what compression ratios I was able to achieve with this, but it was significantly better than compressing the numbers themselves.

like image 169
Jim Mischel Avatar answered Sep 20 '22 16:09

Jim Mischel