Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use deflate instead of gzip for text files served by Apache?

What advantages do either method offer for html, css and javascript files served by a LAMP server. Are there better alternatives?

The server provides information to a map application using Json, so a high volume of small files.

See also Is there any performance hit involved in choosing gzip over deflate for http compression?

like image 395
Ken Avatar asked Dec 23 '08 10:12

Ken


People also ask

Is deflate better than gzip?

GZIP is a file format that uses DEFLATE internally, along with some interesting blocking, filtering heuristics, a header and a checksum. In general, the additional blocking and heuristics that GZIP uses give it better compression ratios than DEFLATE alone.

When should you not use gzip?

If you take a file that is 1300 bytes and compress it to 800 bytes, it's still transmitted in that same 1500 byte packet regardless, so you've gained nothing. That being the case, you should restrict the gzip compression to files with a size greater than a single packet, 1400 bytes (1.4KB) is a safe value.

What is gzip and deflate?

gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. DEFLATE was intended as a replacement for LZW and other patent-encumbered data compression algorithms which, at the time, limited the usability of compress and other popular archivers.

Is ZIP better than gzip?

In general, GZIP is much better compared to ZIP, in terms of compression, especially when compressing a huge number of files. Software that use the ZIP format are capable of both archiving and compressing the files together.


1 Answers

Why use deflate instead of gzip for text files served by Apache?

The simple answer is don't.


RFC 2616 defines deflate as:

deflate The "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951

The zlib format is defined in RFC 1950 as :

     0   1      +---+---+      |CMF|FLG|   (more-->)      +---+---+         0   1   2   3      +---+---+---+---+      |     DICTID    |   (more-->)      +---+---+---+---+       +=====================+---+---+---+---+      |...compressed data...|    ADLER32    |      +=====================+---+---+---+---+ 

So, a few headers and an ADLER32 checksum

RFC 2616 defines gzip as:

gzip An encoding format produced by the file compression program "gzip" (GNU zip) as described in RFC 1952 [25]. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC.

RFC 1952 defines the compressed data as:

The format presently uses the DEFLATE method of compression but can be easily extended to use other compression methods.

CRC-32 is slower than ADLER32

Compared to a cyclic redundancy check of the same length, it trades reliability for speed (preferring the latter).

So ... we have 2 compression mechanisms that use the same algorithm for compression, but a different algorithm for headers and checksum.

Now, the underlying TCP packets are already pretty reliable, so the issue here is not Adler 32 vs CRC-32 that GZIP uses.


Turns out many browsers over the years implemented an incorrect deflate algorithm. Instead of expecting the zlib header in RFC 1950 they simply expected the compressed payload. Similarly various web servers made the same mistake.

So, over the years browsers started implementing a fuzzy logic deflate implementation, they try for zlib header and adler checksum, if that fails they try for payload.

The result of having complex logic like that is that it is often broken. Verve Studio have a user contributed test section that show how bad the situation is.

For example: deflate works in Safari 4.0 but is broken in Safari 5.1, it also always has issues on IE.


So, best thing to do is avoid deflate altogether, the minor speed boost (due to adler 32) is not worth the risk of broken payloads.

like image 150
Sam Saffron Avatar answered Sep 20 '22 15:09

Sam Saffron