Good compression algorithm for small chunks of data? (around 2k in size)

Tags:

I have a system with one machine generate small chunks of data in the form of objects containing arrays of integers and longs. These chunks get passed to another server which in turn distributes them elsewhere.

I want to compress these objects so the memory load on the pass-through server is reduced. I understand that compression algorithms like deflate need to build a dictionary so something like that wouldn't really work on data this small.

Are there any algorithms that could compress data like this efficiently?

If not, another thing I could do is batch these chunks into arrays of objects and compress the array once it gets to be a certain size. But I am reluctant to do this because I would have to change interfaces in an existing system. Compressing them individually would not require any interface changes, the way this is all set up.

Not that I think it matters, but the target system is Java.

Edit: Would Elias gamma coding be the best for this situation?

Thanks

330

asked Sep 29 '11 16:09

marathon

2 Answers

If you think that reducing your data packet to its entropy level is at best as it can be, you can try a simple huffman compression.

For an early look at how well this would compress, you can pass a packet through Huff0 : http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html

It is a simple 0-order huffman encoder. So the result will be representative.

For more specific ideas on how to efficiently use the characteristics of your data, it would be advised to describe a bit what data the packets contains and how it is generated (as you have done in the comments, so they are ints (4 bytes?) and longs (8 bytes?)), and then provide one or a few samples.

answered Nov 09 '22 23:11

Cyan

It sounds like you're currently looking at general-purpose compression algorithms. The most effective way to compress small chunks of data is to build a special-purpose compressor that knows the structure of your data.

The important thing is that you need to match the coding you use with the distribution of values you expect from your data: to get a good result from Elias gamma coding, you need to make sure the values you code are smallish positive integers...

If different integers within the same block are not completely independent (e.g., if your arrays represent a time series), you may be able to use this to improve your compression (e.g., the differences between successive values in a time series tend to be smallish signed integers). However, because each block needs to be independently compressed, you will not be able to take this kind of advantage of differences between successive blocks.

If you're worried that your compressor might turn into an "expander", you can add an initial flag to indicate whether the data is compressed or uncompressed. Then, in the worst case where your data doesn't fit your compression model at all, you can always punt and send the uncompressed version; your worst-case overhead is the size of the flag...

answered Nov 09 '22 23:11

comingstorm

Related questions
                            
                                Efficient way to tokenize a string - C
                            
                                mersenne twister - is there a way to jump to a particular state?
                            
                                Facebook Hacker Cup: After the Dance Battle
                            
                                Java problem time limit exceeded issue
                            
                                Produce MD5 or SHA1 hash code to long (64 bits)
                            
                                Modifying a heap in O(lgn) time
                            
                                Is it possible to convert this recursive solution (to print brackets) to an iterative version?
                            
                                Compact data structure for storing a large set of integral values
                            
                                Trying to understand Quadtree concept and apply it to storing coloring info of an image
                            
                                Open-Source compression algorithm with Checkpoints [closed]
                            
                                Sorting a tuple based on one of the fields
                            
                                A Value Based Heatmap Algorithm
                            
                                Why is this a greedy algorithm?
                            
                                Merge of Skip Lists
                            
                                Efficient point inside rectangle boundaries search
                            
                                Datastructures where Add, Get kth largest are O(log n) and O(1)
                            
                                Large integer radix/base conversion from 10^x to 2^x
                            
                                What is the best way to compare data when importing to database?
                            
                                Simple way to calculate point of intersection between two polygons in C#
                            
                                Intersection points of line bisector with rectangle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Good compression algorithm for small chunks of data? (around 2k in size)

Tags:

algorithm

compression

marathon

People also ask

2 Answers

Cyan

comingstorm

Recent Activity

Donate For Us