Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zlib-compatible compression streams?

Are System.IO.Compression.GZipStream or System.IO.Compression.Deflate compatible with zlib compression?

like image 382
Ben Collins Avatar asked Sep 16 '08 08:09

Ben Collins


People also ask

What compression does zlib use?

As of September 2018, zlib only supports one algorithm, called DEFLATE, which uses a combination of a variation of LZ77 (Lempel–Ziv 1977) and Huffman coding. This algorithm provides good compression on a wide variety of data with minimal use of system resources.

What formats does zlib support?

This package provides a pure interface for compressing and decompressing streams of data represented as lazy ByteString s. It uses the zlib C library so it has high performance. It supports the zlib , gzip and raw compression formats.

Does zlib support gzip?

For applications that require data compression, the functions in this module allow compression and decompression, using the zlib library. The zlib library has its own home page at https://www.zlib.net.

Is zlib and gzip the same?

The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .


2 Answers

I ran into this issue with Git objects. In that particular case, they store the objects as deflated blobs with a Zlib header, which is documented in RFC 1950. You can make a compatible blob by making a file that contains:

  • Two header bytes (CMF and FLG from RFC 1950) with the values 0x78 0x01
    • CM = 8 = deflate
    • CINFO = 7 = 32Kb window
    • FCHECK = 1 = checksum bits for this header
  • The output of the C# DeflateStream
  • An Adler32 checksum of the input data to the DeflateStream, big-endian format (MSB first)

I made my own Adler implementation

public class Adler32Computer
{
    private int a = 1;
    private int b = 0;

    public int Checksum
    {
        get
        {
            return ((b * 65536) + a);
        }
    }

    private static readonly int Modulus = 65521;

    public void Update(byte[] data, int offset, int length)
    {
        for (int counter = 0; counter < length; ++counter)
        {
            a = (a + (data[offset + counter])) % Modulus;
            b = (b + a) % Modulus;
        }
    }
}

And that was pretty much it.

like image 163
Blake Ramsdell Avatar answered Sep 17 '22 10:09

Blake Ramsdell


DotNetZip includes a DeflateStream, a ZlibStream, and a GZipStream, to handle RFC 1950, 1951, and 1952. The all use the DEFLATE Algorithm but the framing and header bytes are different for each one.

As an advantage, the streams in DotNetZip do not exhibit the anomaly of expanding data size under compression, reported against the built-in streams. Also, there is no built-in ZlibStream, whereas DotNetZip gives you that, for good interop with zlib.

like image 27
Cheeso Avatar answered Sep 19 '22 10:09

Cheeso