I have a large binary file that represents the alpha channel for each pixel in an image - 0 for transparent, 1 for anything else. This binary data needs to be dynamically loaded from a text file, and it would be useful to get the maximum possible compression in it. De-compression times aren't majorly important (unless we're talking a jump of say a minute to an hour), but the files need to be as small as possible.
Methods we've tried so far are using run length encoding, then a huffman coding, then converting the binary data to base64, and run length encoding but differentiating between zero and one using numeric values for one and alphabetical equivalents for zero (seems to give the best results). However, we're wondering if there's a better solution than either of these as we're approaching it from a logical standpoint, rather than looking at all possible methods.
When you use COMPRESS=BINARY, patterns of multiple characters across the entire observation are compressed. Binary compression uses two techniques at the same time. This option searches for the following: Repeating byte sequences (for example, 10 blank spaces or 10 zero bytes in a row)
There are two types of compression: lossless and lossy.
Lossy compression results in a significantly reduced file size (smaller than lossless compression), which is its most noteworthy benefit. It is supported by many tools, plugins and software products that let the user choose their preferred degree of compression.
As external libraries were out fo the question, I created a custom solution for this. The system used run length encoding to compress the data, then the RLE encoded data was represented in base32 (32 characters for the zeroes, and the matching set for ones). This allowed us to represent files approximately 5MB in size with only around 30KB, without any loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With