Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

golang/python zlib difference

Tags:

python

zlib

go

Debugging differences between Python's zlib and golang's zlib. Why don't the following have the same results?

compress.go:

package main

import (
    "compress/flate"
    "bytes"
    "fmt"
)


func compress(source string) []byte {
    w, _ := flate.NewWriter(nil, 7)
    buf := new(bytes.Buffer)

    w.Reset(buf)
    w.Write([]byte(source))
    w.Close()

    return buf.Bytes()
}


func main() {
    example := "foo"
    compressed := compress(example)
    fmt.Println(compressed)
}

compress.py:

from __future__ import print_function

import zlib


def compress(source):
    # golang zlib strips header + checksum
    compressor = zlib.compressobj(7, zlib.DEFLATED, -15)
    compressor.compress(source)
    # python zlib defaults to Z_FLUSH, but 
    # https://golang.org/pkg/compress/flate/#Writer.Flush
    # says "Flush is equivalent to Z_SYNC_FLUSH"
    return compressor.flush(zlib.Z_SYNC_FLUSH)


def main():
    example = u"foo"
    compressed = compress(example)
    print(list(bytearray(compressed)))


if __name__ == "__main__":
    main()

Results

$ go version
go version go1.7.3 darwin/amd64
$ go build compress.go
$ ./compress
[74 203 207 7 4 0 0 255 255]
$ python --version
$ python 2.7.12
$ python compress.py
[74, 203, 207, 7, 0, 0, 0, 255, 255]

The Python version has 0 for the fifth byte, but the golang version has 4 -- what's causing the different output?

like image 988
everial Avatar asked Nov 30 '16 16:11

everial


People also ask

Is zlib part of Python?

The zlib is a Python library that supports the zlib C library, a higher-level generalization for deflating lossless compression algorithms. The zlib library is used to lossless compress, which means there is no data loss between compression and decompression).

Is zlib same as gzip?

The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .

What is the use of zlib?

zlib (/ˈziːlɪb/ or "zeta-lib", /ˈziːtəˌlɪb/) is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program.

Is zlib compatible with gzip?

For applications that require data compression, the functions in this module allow compression and decompression, using the zlib library. The zlib library has its own home page at https://www.zlib.net.


1 Answers

The output from the python example isn't a "complete" stream, its just flushing the buffer after compressing the first string. You can get the same output from the Go code by replacing Close() with Flush():

https://play.golang.org/p/BMcjTln-ej

func compress(source string) []byte {
    buf := new(bytes.Buffer)
    w, _ := flate.NewWriter(buf, 7)
    w.Write([]byte(source))
    w.Flush()

    return buf.Bytes()
}

However, you are comparing output from zlib in python, which uses DEFLATE internally to produce a zlib format output, and flate in Go, which is a DEFLATE implementation. I don't know if you can get the python zlib library to output the raw, complete DEFLATE stream, but trying to get different libraries to output byte-for-byte matches of compressed data doesn't seem useful or maintainable. The output of the compression libraries is only guaranteed to be compatible, not identical.

like image 118
JimB Avatar answered Sep 23 '22 11:09

JimB