Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve number compression algorithm?

Tags:

algorithm

I have many unique numbers, all positive and the order doesn't matter, 0 < num < 2^32.
Example: 23 56 24 26

The biggest, 56, needs 6 bits space. So, I need: 4*6 = 24 bits in total.

I do the following to save space:
I sort them first: 23 24 26 56 (because the order doesn't matter)
Now I get the difference of each from the previous: 23 1 2 30

The biggest, 30, needs 5 bits space.
After this I store all the numbers in 4*5 bits = 20 bits space.

Question: how to further improve this algorithm?

More information: Since requested, the numbers are mostly on the range of 2.000-4.000. Numbers less than 300 are pretty rare. Numbers more than 16.000 are pretty rare also. Generally speaking, all the numbers will be close. For example, they may be all in the 1.000-2.000 range or they may all be in the 16.000-20.000 range. The total number of numbers will be something in the range of 500-5.000.

like image 850
Luka Avatar asked Dec 26 '22 14:12

Luka


1 Answers

Your first step is good one to take because sorting reduces the differences to least. Here is a way to improve your algorithm:

  1. sort and calculate differences as you have done.
  2. Use Huffman coding on it.

Use of Huffman coding is more important then your step; I'll show you why:

consider the following data:

1 2 3 4 5 6 7 4294967295

where 4294967295 = 2^32-1. Using your algorithm:

1 1 1 1 1 1 1 4294967288

total bits needed is still 32*8

Using Huffman coding, the frequencies are:

1 => 7
4294967288 => 1

Huffman codes are 1 => 0 and 4294967288 => 1

Total bits needed = 7*1 + 1 = 8 bits

Huffman coding reduces size by 32*8/8 = 32 times

like image 131
Vikram Bhat Avatar answered Jan 05 '23 21:01

Vikram Bhat