Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best compression technique for binary data? [closed]

Tags:

compression

I have a large binary file that represents the alpha channel for each pixel in an image - 0 for transparent, 1 for anything else. This binary data needs to be dynamically loaded from a text file, and it would be useful to get the maximum possible compression in it. De-compression times aren't majorly important (unless we're talking a jump of say a minute to an hour), but the files need to be as small as possible.

Methods we've tried so far are using run length encoding, then a huffman coding, then converting the binary data to base64, and run length encoding but differentiating between zero and one using numeric values for one and alphabetical equivalents for zero (seems to give the best results). However, we're wondering if there's a better solution than either of these as we're approaching it from a logical standpoint, rather than looking at all possible methods.

like image 856
Jim Avatar asked Dec 04 '10 15:12

Jim


People also ask

Can binary data be compressed?

When you use COMPRESS=BINARY, patterns of multiple characters across the entire observation are compressed. Binary compression uses two techniques at the same time. This option searches for the following: Repeating byte sequences (for example, 10 blank spaces or 10 zero bytes in a row)

What are the 2 compression techniques?

There are two types of compression: lossless and lossy.

Which compression method results best size reduction?

Lossy compression results in a significantly reduced file size (smaller than lossless compression), which is its most noteworthy benefit. It is supported by many tools, plugins and software products that let the user choose their preferred degree of compression.


1 Answers

As external libraries were out fo the question, I created a custom solution for this. The system used run length encoding to compress the data, then the RLE encoded data was represented in base32 (32 characters for the zeroes, and the matching set for ones). This allowed us to represent files approximately 5MB in size with only around 30KB, without any loss.

like image 64
Jim Avatar answered Oct 26 '22 04:10

Jim