I am interfacing with a server that requires that data sent to it is compressed with Deflate algorithm (Huffman encoding + LZ77) and also sends data that I need to Inflate.
I know that Python includes Zlib, and that the C libraries in Zlib support calls to Inflate and Deflate, but these apparently are not provided by the Python Zlib module. It does provide Compress and Decompress, but when I make a call such as the following:
result_data = zlib.decompress( base64_decoded_compressed_string )
I receive the following error:
Error -3 while decompressing data: incorrect header check
Gzip does no better; when making a call such as:
result_data = gzip.GzipFile( fileobj = StringIO.StringIO( base64_decoded_compressed_string ) ).read()
I receive the error:
IOError: Not a gzipped file
which makes sense as the data is a Deflated file not a true Gzipped file.
Now I know that there is a Deflate implementation available (Pyflate), but I do not know of an Inflate implementation.
It seems that there are a few options:
I am seeking a solution, but lacking a solution I will be thankful for insights, constructive opinions, and ideas.
Additional information: The result of deflating (and encoding) a string should, for the purposes I need, give the same result as the following snippet of C# code, where the input parameter is an array of UTF bytes corresponding to the data to compress:
public static string DeflateAndEncodeBase64(byte[] data)
{
if (null == data || data.Length < 1) return null;
string compressedBase64 = "";
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
byte[] compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
compressedBase64 = Convert.ToBase64String(compressedBytes);
}
}
return compressedBase64;
}
Running this .NET code for the string "deflate and encode me" gives the result
7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ/ff/z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT/Dw==
When "deflate and encode me" is run through the Python Zlib.compress() and then base64 encoded, the result is "eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=".
It is clear that zlib.compress() is not an implementation of the same algorithm as the standard Deflate algorithm.
More Information:
The first 2 bytes of the .NET deflate data ("7b0HY..."), after b64 decoding are 0xEDBD, which does not correspond to Gzip data (0x1f8b), BZip2 (0x425A) data, or Zlib (0x789C) data.
The first 2 bytes of the Python compressed data ("eJxLS..."), after b64 decoding are 0x789C. This is a Zlib header.
SOLVED
To handle the raw deflate and inflate, without header and checksum, the following things needed to happen:
On deflate/compress: strip the first two bytes (header) and the last four bytes (checksum).
On inflate/decompress: there is a second argument for window size. If this value is negative it suppresses headers. here are my methods currently, including the base64 encoding/decoding - and working properly:
import zlib
import base64
def decode_base64_and_inflate( b64string ):
decoded_data = base64.b64decode( b64string )
return zlib.decompress( decoded_data , -15)
def deflate_and_base64_encode( string_val ):
zlibbed_str = zlib.compress( string_val )
compressed_string = zlibbed_str[2:-4]
return base64.b64encode( compressed_string )
You can still use the zlib
module to inflate/deflate data. The gzip
module uses it internally, but adds a file-header to make it into a gzip-file. Looking at the gzip.py
file, something like this could work:
import zlib
def deflate(data, compresslevel=9):
compress = zlib.compressobj(
compresslevel, # level: 0-9
zlib.DEFLATED, # method: must be DEFLATED
-zlib.MAX_WBITS, # window size in bits:
# -15..-8: negate, suppress header
# 8..15: normal
# 16..30: subtract 16, gzip header
zlib.DEF_MEM_LEVEL, # mem level: 1..8/9
0 # strategy:
# 0 = Z_DEFAULT_STRATEGY
# 1 = Z_FILTERED
# 2 = Z_HUFFMAN_ONLY
# 3 = Z_RLE
# 4 = Z_FIXED
)
deflated = compress.compress(data)
deflated += compress.flush()
return deflated
def inflate(data):
decompress = zlib.decompressobj(
-zlib.MAX_WBITS # see above
)
inflated = decompress.decompress(data)
inflated += decompress.flush()
return inflated
I don't know if this corresponds exactly to whatever your server requires, but those two functions are able to round-trip any data I tried.
The parameters maps directly to what is passed to the zlib library functions.
Python ⇒ Czlib.compressobj(...)
⇒ deflateInit(...)
compressobj.compress(...)
⇒ deflate(...)
zlib.decompressobj(...)
⇒ inflateInit(...)
decompressobj.decompress(...)
⇒ inflate(...)
The constructors create the structure and populate it with default values, and pass it along to the init-functions.
The compress
/decompress
methods update the structure and pass it to inflate
/deflate
.
This is an add-on to MizardX's answer, giving some explanation and background.
See http://www.chiramattel.com/george/blog/2007/09/09/deflatestream-block-length-does-not-match.html
According to RFC 1950, a zlib stream constructed in the default manner is composed of:
The C# DeflateStream
works on (you guessed it) a deflate stream. MizardX's code is telling the zlib module that the data is a raw deflate stream.
Observations: (1) One hopes the C# "deflation" method producing a longer string happens only with short input (2) Using the raw deflate stream without the Adler-32 checksum? Bit risky, unless replaced with something better.
Updates
error message Block length does not match with its complement
If you are trying to inflate some compressed data with the C# DeflateStream
and you get that message, then it is quite possible that you are giving it a a zlib stream, not a deflate stream.
See How do you use a DeflateStream on part of a file?
Also copy/paste the error message into a Google search and you will get numerous hits (including the one up the front of this answer) saying much the same thing.
The Java Deflater
... used by "the website" ... C# DeflateStream "is pretty straightforward and has been tested against the Java implementation". Which of the following possible Java Deflater constructors is the website using?
public Deflater(int level, boolean nowrap)
Creates a new compressor using the specified compression level. If 'nowrap' is true then the ZLIB header and checksum fields will not be used in order to support the compression format used in both GZIP and PKZIP.
public Deflater(int level)
Creates a new compressor using the specified compression level. Compressed data will be generated in ZLIB format.
public Deflater()
Creates a new compressor with the default compression level. Compressed data will be generated in ZLIB format.
A one-line deflater after throwing away the 2-byte zlib header and the 4-byte checksum:
uncompressed_string.encode('zlib')[2:-4] # does not work in Python 3.x
or
zlib.compress(uncompressed_string)[2:-4]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With