Debugging differences between Python's zlib and golang's zlib. Why don't the following have the same results?
compress.go
:
package main
import (
"compress/flate"
"bytes"
"fmt"
)
func compress(source string) []byte {
w, _ := flate.NewWriter(nil, 7)
buf := new(bytes.Buffer)
w.Reset(buf)
w.Write([]byte(source))
w.Close()
return buf.Bytes()
}
func main() {
example := "foo"
compressed := compress(example)
fmt.Println(compressed)
}
compress.py
:
from __future__ import print_function
import zlib
def compress(source):
# golang zlib strips header + checksum
compressor = zlib.compressobj(7, zlib.DEFLATED, -15)
compressor.compress(source)
# python zlib defaults to Z_FLUSH, but
# https://golang.org/pkg/compress/flate/#Writer.Flush
# says "Flush is equivalent to Z_SYNC_FLUSH"
return compressor.flush(zlib.Z_SYNC_FLUSH)
def main():
example = u"foo"
compressed = compress(example)
print(list(bytearray(compressed)))
if __name__ == "__main__":
main()
Results
$ go version
go version go1.7.3 darwin/amd64
$ go build compress.go
$ ./compress
[74 203 207 7 4 0 0 255 255]
$ python --version
$ python 2.7.12
$ python compress.py
[74, 203, 207, 7, 0, 0, 0, 255, 255]
The Python version has 0
for the fifth byte, but the golang version has 4
-- what's causing the different output?
The zlib is a Python library that supports the zlib C library, a higher-level generalization for deflating lossless compression algorithms. The zlib library is used to lossless compress, which means there is no data loss between compression and decompression).
The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .
zlib (/ˈziːlɪb/ or "zeta-lib", /ˈziːtəˌlɪb/) is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program.
For applications that require data compression, the functions in this module allow compression and decompression, using the zlib library. The zlib library has its own home page at https://www.zlib.net.
The output from the python example isn't a "complete" stream, its just flushing the buffer after compressing the first string. You can get the same output from the Go code by replacing Close()
with Flush()
:
https://play.golang.org/p/BMcjTln-ej
func compress(source string) []byte {
buf := new(bytes.Buffer)
w, _ := flate.NewWriter(buf, 7)
w.Write([]byte(source))
w.Flush()
return buf.Bytes()
}
However, you are comparing output from zlib in python, which uses DEFLATE internally to produce a zlib format output, and flate
in Go, which is a DEFLATE implementation. I don't know if you can get the python zlib library to output the raw, complete DEFLATE stream, but trying to get different libraries to output byte-for-byte matches of compressed data doesn't seem useful or maintainable. The output of the compression libraries is only guaranteed to be compatible, not identical.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With