Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download .gz files with requests in Python without decoding it?

I am downloading a file using requests:

import requests

req = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in req.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
            f.flush()

The problem with gzip files is that they being automatically decoded by requests, hence i get the unpacked file on disk, while i need the original file.

Is there a way to tell requests not to do this?

like image 430
funkifunki Avatar asked Sep 09 '14 16:09

funkifunki


People also ask

How do I open a .gz file without extracting it?

Just use zcat to see content without extraction. From the manual: zcat is identical to gunzip -c . (On some systems, zcat may be installed as gzcat to preserve the original link to compress .)


2 Answers

import requests

r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in r.raw.stream(1024, decode_content=False):
        if chunk:
            f.write(chunk)

This way, you will avoid automatic decompress of gzip-encoded response, save it to file as it's received from web server, chunk by chunk.

like image 183
Boban P. Avatar answered Oct 17 '22 17:10

Boban P.


As discussed in the comments above, this seems to have solved the issue:

From the docs for the requests module:

Requests automatically decompresses gzip-encoded responses ... You can get direct access to the raw response (and even the socket), if needed as well.

Searching the docs for "raw responses" yields requests.Response.raw, which gives a file-like representation of the raw response stream.

like image 32
Dan Lenski Avatar answered Oct 17 '22 16:10

Dan Lenski