Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good compression algorithm for small chunks of data? (around 2k in size)

I have a system with one machine generate small chunks of data in the form of objects containing arrays of integers and longs. These chunks get passed to another server which in turn distributes them elsewhere.

I want to compress these objects so the memory load on the pass-through server is reduced. I understand that compression algorithms like deflate need to build a dictionary so something like that wouldn't really work on data this small.

Are there any algorithms that could compress data like this efficiently?

If not, another thing I could do is batch these chunks into arrays of objects and compress the array once it gets to be a certain size. But I am reluctant to do this because I would have to change interfaces in an existing system. Compressing them individually would not require any interface changes, the way this is all set up.

Not that I think it matters, but the target system is Java.

Edit: Would Elias gamma coding be the best for this situation?

Thanks

like image 330
marathon Avatar asked Sep 29 '11 16:09

marathon


People also ask

How small can data be compressed?

Given the right algorithm, there is no real limit for how much compression can occur if the actual information content is essentially zero. Eg, you could have a simple algorithm that compresses adjacent identical characters into an escape character, the repeated character, and a 2-byte count.

What is the best image compression algorithm?

The DCT is sometimes referred to as "DCT-II" in the context of a family of discrete cosine transforms (see discrete cosine transform). It is generally the most efficient form of image compression.

Which compression algorithm is fastest?

The Lempel Ziv Oberhumer (LZO), a lightweight compression algorithm using dictionary encoding, is currently one of the fastest algorithms and is widely used.


2 Answers

If you think that reducing your data packet to its entropy level is at best as it can be, you can try a simple huffman compression.

For an early look at how well this would compress, you can pass a packet through Huff0 : http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html

It is a simple 0-order huffman encoder. So the result will be representative.

For more specific ideas on how to efficiently use the characteristics of your data, it would be advised to describe a bit what data the packets contains and how it is generated (as you have done in the comments, so they are ints (4 bytes?) and longs (8 bytes?)), and then provide one or a few samples.

like image 54
Cyan Avatar answered Nov 09 '22 23:11

Cyan


It sounds like you're currently looking at general-purpose compression algorithms. The most effective way to compress small chunks of data is to build a special-purpose compressor that knows the structure of your data.

The important thing is that you need to match the coding you use with the distribution of values you expect from your data: to get a good result from Elias gamma coding, you need to make sure the values you code are smallish positive integers...

If different integers within the same block are not completely independent (e.g., if your arrays represent a time series), you may be able to use this to improve your compression (e.g., the differences between successive values in a time series tend to be smallish signed integers). However, because each block needs to be independently compressed, you will not be able to take this kind of advantage of differences between successive blocks.


If you're worried that your compressor might turn into an "expander", you can add an initial flag to indicate whether the data is compressed or uncompressed. Then, in the worst case where your data doesn't fit your compression model at all, you can always punt and send the uncompressed version; your worst-case overhead is the size of the flag...

like image 40
comingstorm Avatar answered Nov 09 '22 23:11

comingstorm