I am trying to better understand how different compression levels (1-9) of gzip differ in the way that encoding is implemented.
I've looked the zlib C source code and it seems that it has to do with how exhaustive the search for the longest matching string is, but looking for more specific information.
For example, do the levels yield any differences in the assignment of Huffman codes?
The levels differ only in how hard deflate looks for matching strings, as you observed. The Huffman coding is done on a chosen fixed number of symbols (literals and length/distance pairs), producing a "block", where that number is defined by the memory level, not the compression level. The Huffman codes generated will necessarily differ, since the symbols being coded will differ.
The choice of memory level also has some effect on compression, as a larger number of symbols spreads the cost of the code description for a block over more symbols, but too many symbols may prevent adaptation of the Huffman codes to local changes in the statistics of the symbols. The default memory level is 8 (resulting in 16,383 symbols per block), since testing indicated that that gave better compression than level 9 (32,767 symbols per block). However your mileage may vary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With