If I run
dd if=/dev/zero bs=8192 count=1000000 | gzip --best >test.gz
I get an output file of 7950131 bytes. So a compression ratio of ~1/1000. However I expected that gzip uses run length encoding + huffman codes and thus would achieve at least 1/100000. Why is it that it does not compress any better for an all zeroes file?
The maximum compression ratio of the deflate format is 1032:1. This is because the longest run that can be encoded is 258 bytes. At least two bits are required for each such run (one bit for the length code and one bit for the distance code), hence 4*258 = 1032 uncompressed bytes can be encoded per one compressed byte.
You can get more compression by gzipping the result of gzip. Normally that doesn't improve compression, but for very long runs it can.
By the way, the LZ77 approach used by deflate is more general than run-length encoding. Instead of just a length, a length/distance pair is used. This allows copying a string from some distance back, or replicating a byte as in run-length for a distance of one, or replicating triples of bytes with a distance of three, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With