I have a byte array containing data that is compressed by gzip. Now I need to uncompress this data. How can this be achieved?
gzip compressed files often have the . gz file extension (in fact, I don't think I've ever seen a . gzip extension), but it's generally unsafe to rely on file extension to test for the type of file anyhow. The c 'library' gzip, ie gzopen/gzread/etc will transparently read uncompressed files.
zlib.decompress(data, 15 + 32) should autodetect whether you have gzip
data or zlib
data.
zlib.decompress(data, 15 + 16) should work if gzip
and barf if zlib
.
Here it is with Python 2.7.1, creating a little gz file, reading it back, and decompressing it:
>>> import gzip, zlib
>>> f = gzip.open('foo.gz', 'wb')
>>> f.write(b"hello world")
11
>>> f.close()
>>> c = open('foo.gz', 'rb').read()
>>> c
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> ba = bytearray(c)
>>> ba
bytearray(b'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00')
>>> zlib.decompress(ba, 15+32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be string or read-only buffer, not bytearray
>>> zlib.decompress(bytes(ba), 15+32)
'hello world'
>>>
Python 3.x usage would be very similar.
Update based on comment that you are running Python 2.2.1.
Sigh. That's not even the last release of Python 2.2. Anyway, continuing with the foo.gz
file created as above:
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> strobj = open('foo.gz', 'rb').read()
>>> strobj
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> import zlib
>>> zlib.decompress(strobj, 15+32)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
>>> zlib.decompress(strobj, 15+16)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
# OK, we can't use the back door method. Plan B: use the
# documented approach i.e. gzip.GzipFile with a file-like object.
>>> import gzip, cStringIO
>>> fileobj = cStringIO.StringIO(strobj)
>>> gzf = gzip.GzipFile('dummy-name', 'rb', 9, fileobj)
>>> gzf.read()
'hello world'
# Success. Now let's assume you have an array.array object-- which requires
# premeditation; they aren't created accidentally!
# The following code assumes subtype 'B' but should work for any subtype.
>>> import array, sys
>>> aaB = array.array('B')
>>> aaB.fromfile(open('foo.gz', 'rb'), sys.maxint)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
EOFError: not enough items in file
#### Don't panic, just read the fine manual
>>> aaB
array('B', [31, 139, 8, 8, 20, 244, 220, 77, 2, 255, 102, 111, 111, 0, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0])
>>> strobj2 = aaB.tostring()
>>> strobj2 == strobj
1 #### means True
# You can make a str object and use that as above.
# ... or you can plug it directly into StringIO:
>>> gzip.GzipFile('dummy-name', 'rb', 9, cStringIO.StringIO(aaB)).read()
'hello world'
Apparently you can do this
import zlib
# ...
ungziped_str = zlib.decompressobj().decompress('x\x9c' + gziped_str)
Or this:
zlib.decompress( data ) # equivalent to gzdecompress()
For more info, look here: Python docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With