Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

howto uncompress gzipped data in a byte array?

Tags:

python

I have a byte array containing data that is compressed by gzip. Now I need to uncompress this data. How can this be achieved?

like image 685
Sylar Avatar asked May 25 '11 10:05

Sylar


People also ask

How do you check if files are Gzipped?

gzip compressed files often have the . gz file extension (in fact, I don't think I've ever seen a . gzip extension), but it's generally unsafe to rely on file extension to test for the type of file anyhow. The c 'library' gzip, ie gzopen/gzread/etc will transparently read uncompressed files.


2 Answers

zlib.decompress(data, 15 + 32) should autodetect whether you have gzip data or zlib data.

zlib.decompress(data, 15 + 16) should work if gzip and barf if zlib.

Here it is with Python 2.7.1, creating a little gz file, reading it back, and decompressing it:

>>> import gzip, zlib
>>> f = gzip.open('foo.gz', 'wb')
>>> f.write(b"hello world")
11
>>> f.close()
>>> c = open('foo.gz', 'rb').read()
>>> c
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> ba = bytearray(c)
>>> ba
bytearray(b'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00')
>>> zlib.decompress(ba, 15+32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be string or read-only buffer, not bytearray
>>> zlib.decompress(bytes(ba), 15+32)
'hello world'
>>>

Python 3.x usage would be very similar.

Update based on comment that you are running Python 2.2.1.

Sigh. That's not even the last release of Python 2.2. Anyway, continuing with the foo.gz file created as above:

Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> strobj = open('foo.gz', 'rb').read()
>>> strobj
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> import zlib
>>> zlib.decompress(strobj, 15+32)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
>>> zlib.decompress(strobj, 15+16)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data

# OK, we can't use the back door method. Plan B: use the 
# documented approach i.e. gzip.GzipFile with a file-like object.

>>> import gzip, cStringIO
>>> fileobj = cStringIO.StringIO(strobj)
>>> gzf = gzip.GzipFile('dummy-name', 'rb', 9, fileobj)
>>> gzf.read()
'hello world'

# Success. Now let's assume you have an array.array object-- which requires
# premeditation; they aren't created accidentally!
# The following code assumes subtype 'B' but should work for any subtype.

>>> import array, sys
>>> aaB = array.array('B')
>>> aaB.fromfile(open('foo.gz', 'rb'), sys.maxint)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
EOFError: not enough items in file
#### Don't panic, just read the fine manual
>>> aaB
array('B', [31, 139, 8, 8, 20, 244, 220, 77, 2, 255, 102, 111, 111, 0, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0])
>>> strobj2 = aaB.tostring()
>>> strobj2 == strobj
1 #### means True 
# You can make a str object and use that as above.

# ... or you can plug it directly into StringIO:
>>> gzip.GzipFile('dummy-name', 'rb', 9, cStringIO.StringIO(aaB)).read()
'hello world'
like image 114
John Machin Avatar answered Oct 01 '22 00:10

John Machin


Apparently you can do this

import zlib
# ...
ungziped_str = zlib.decompressobj().decompress('x\x9c' + gziped_str)

Or this:

zlib.decompress( data ) # equivalent to gzdecompress()

For more info, look here: Python docs

like image 24
evgeny Avatar answered Oct 01 '22 01:10

evgeny