Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python decompressing gzip chunk-by-chunk

Tags:

python

gzip

zlib

I've a memory- and disk-limited environment where I need to decompress the contents of a gzip file sent to me in string-based chunks (over xmlrpc binary transfer). However, using the zlib.decompress() or zlib.decompressobj()/decompress() both barf over the gzip header. I've tried offsetting past the gzip header (documented here), but still haven't managed to avoid the barf. The gzip library itself only seems to support decompressing from files.

The following snippet gives a simplified illustration of what I would like to do (except in real life the buffer will be filled from xmlrpc, rather than reading from a local file):

#! /usr/bin/env python  import zlib  CHUNKSIZE=1000  d = zlib.decompressobj()  f=open('23046-8.txt.gz','rb') buffer=f.read(CHUNKSIZE)  while buffer:   outstr = d.decompress(buffer)   print(outstr)   buffer=f.read(CHUNKSIZE)  outstr = d.flush() print(outstr)  f.close() 

Unfortunately, as I said, this barfs with:

Traceback (most recent call last):   File "./test.py", line 13, in <module>     outstr = d.decompress(buffer) zlib.error: Error -3 while decompressing: incorrect header check  

Theoretically, I could feed my xmlrpc-sourced data into a StringIO and then use that as a fileobj for gzip.GzipFile(), however, in real life, I don't have memory available to hold the entire file contents in memory as well as the decompressed data. I really do need to process it chunk-by-chunk.

The fall-back would be to change the compression of my xmlrpc-sourced data from gzip to plain zlib, but since that impacts other sub-systems I'd prefer to avoid it if possible.

Any ideas?

like image 276
user291294 Avatar asked Mar 11 '10 09:03

user291294


People also ask

How do I unzip a gzip string in Python?

With the help of gzip. decompress(s) method, we can decompress the compressed bytes of string into original string by using gzip. decompress(s) method. Return : Return decompressed string.

How do I read a gzip file?

Launch WinZip from your start menu or Desktop shortcut. Open the compressed file by clicking File > Open. If your system has the compressed file extension associated with WinZip program, just double-click on the file.


2 Answers

gzip and zlib use slightly different headers.

See How can I decompress a gzip stream with zlib?

Try d = zlib.decompressobj(16+zlib.MAX_WBITS).

And you might try changing your chunk size to a power of 2 (say CHUNKSIZE=1024) for possible performance reasons.

like image 73
wisty Avatar answered Oct 05 '22 07:10

wisty


I've got a more detailed answer here: https://stackoverflow.com/a/22310760/1733117

d = zlib.decompressobj(zlib.MAX_WBITS|32) 

per documentation this automatically detects the header (zlib or gzip).

like image 27
dnozay Avatar answered Oct 05 '22 05:10

dnozay