Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Requests Gzip HTTP download and write to disk

I'm using the requests library and python 2.7 to download a gzipped text file from a web api. Using the code below, I'm able to successfully send a get request and, judging from the headers, receive a response in the formed of the gzip file.

I know Requests decompresses these files for you automatically if it detects from the header that the response is gzipped. I wanted to take that download in the form of a file stream and write the contents to disk for storage and future analysis.

When I get open the resulting file in my working directory however I get characters like this: —}}¶— Q@Ï 'õ

For reference, some of the response headers include 'Content-Encoding': 'gzip', 'Content-Type': 'application/download', 'Accept-Encoding,User-Agent'

Am I wrong to write in binary? Am I not encoding the text correctly(ie. could it be ASCII vs utf-8)? There is no apparent character encoding noted in the response headers.

try:
    response = requests.get(url, paramDict, stream=True)
except Exception as e:
    print(e)

with open(outName, 'wb') as out_file:
    for chunk in response.iter_content(chunk_size=1024):
        out_file.write(chunk)

EDIT 3.30.2016: Now I've changed my code a little bit to utilize gzipstream library. I tried using the stream to read the entirety of the Gzipped text file that is in my response content:

with open(outName, 'wb') as out_file, GzipStreamFile(response.content) as fileStream:
    streamContent = fileStream.read()
    out_file.write(streamContent)

I then received this error: out_file.write(streamContent) AttributeError: '_GzipStreamFile' object has no attribute 'close'

The output was an empty text file with the file name as anticipated. Do I need to initialize my streamContent variable outside of the with block so that it doesn't automatically try to call a close method at the end of the block?

EDIT 4.1.2016 Just thought I'd clarify that this DOES NOT have to be a stream, that was just one solution I encountered. I just want to make a daily request for this gzipped file and have it saved locally in plaintext

like image 247
jaxas Avatar asked Dec 25 '22 07:12

jaxas


1 Answers

try:
    response = requests.get(url, paramDict)
except Exception as e:
    print(e)

data = zlib.decompress(response.content, zlib.MAX_WBITS|32)

with open('outFileName.txt','w') as outFile:
    outFile.write(data)

Here is the code that I wrote that ended up working. It is as sigmavirus said: the file was gzipped to begin with. I knew this fact, but did not describe it clearly enough apparently as I kept read/writing the gzipped bytes.

Using the zlib module, I was able to decompress the content of the response all at one time into the data variable; I then wrote that variable containing the decompressed data into a file.

I'm not sure if this is the best or most pythonic way to do this, but it worked. If anyone can enlighten me as to why I cannot gzip.open this content (perhaps I needed to use an alternative method, I tried gzipstream library to no avail), I would appreciate any explanations, but I do consider this question answered.

Thanks to everyone who helped me, even if you didn't have the solution, you helped encourage me to persevere!

like image 165
jaxas Avatar answered Dec 27 '22 05:12

jaxas