Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best file compression of random binary data that you can achieve? [closed]

Specifically, what programs are out there and what has the highest compression ratio? I tried Googling it, but it seems experience would trump search results, so I ask.

like image 839
ox_n Avatar asked Jan 17 '11 17:01

ox_n


People also ask

What is the best file compression method?

The winner by pure compression is 7z, which isn't surprising to us. We've seen 7z come on the top of file compression benchmarks time and time again. If you want to compress something to use as little space as possible, you should definitely use 7z.

Can binary files be compressed?

False. Truly random data that is perfectly spread cannot be compressed.

What is the most powerful compression algorithm?

At maximum compression level, ZIPX is the fastest format, followed by RAR, ARC, and 7Z, ZPAQ being the slowest.


1 Answers

If file sizes could be specified accurate to the bit, for any file size N, there would be precisely 2^(N+1)-1 possible files of N bits or smaller. In order for a file of size X to be mapped to some smaller size Y, some file of size Y or smaller must be mapped to a file of size X or larger. The only way lossless compression can work is if some possible files can be identified as being more probable than others; in that scenario, the likely files will be shrunk and the unlikely ones will grow.

As a simple example, suppose that one wishes to store losslessly a file in which the bits are random and independent, but instead of 50% of the bits being set, only 33% are. One could compress such a file by taking each pair of bits and writing "0" if both bits were clear, "10" if the first bit was set and the second one not, "110" if the second was set and the first not, or "111" if both bits were set. The effect would be that each pair of bits would become one bit 44% of the time, two bits 22% of the time, and three bits 33% of the time. While some strings of data would grow, others would shrink; the pairs that shrank would--if the probability distribution was as expected--outnumber those that grow (4/9 files would shrink by a bit, 2/9 would stay the same, and 3/9 would grow, so pairs would on average shrink by 1/9 bit, and files would on average shrink by 1/18 [since the 1/9 figure was bits per pair]).

Note that if the bits actually had a 50% distribution, then only 25% of pairs would become one bit, 25% would stay two bits, and 50% would become three bits. Consequently, 25% of bits would shrink and 50% would grow, so pairs on average would grow by 25%, and files would grow by 12.5%. The break-even point would be about 38.2% of bits being set (two minus the golden mean), which would yield 38.2% of bit pairs shrinking and the same percentage growing.

like image 160
supercat Avatar answered Oct 12 '22 17:10

supercat