Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download and decompress gzipped file in memory?

I would like to download a file using urllib and decompress the file in memory before saving.

This is what I have right now:

response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') outfile = open(outFilePath, 'w') outfile.write(decompressedFile.read()) 

This ends up writing empty files. How can I achieve what I'm after?

Updated Answer:

#! /usr/bin/env python2 import urllib2 import StringIO import gzip  baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"         # check filename: it may change over time, due to new updates filename = "man-pages-5.00.tar.gz"  outFilePath = filename[:-3]  response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile)  with open(outFilePath, 'w') as outfile:     outfile.write(decompressedFile.read()) 
like image 555
OregonTrail Avatar asked Mar 12 '13 03:03

OregonTrail


People also ask

How do you check if files are Gzipped?

gzip compressed files often have the . gz file extension (in fact, I don't think I've ever seen a . gzip extension), but it's generally unsafe to rely on file extension to test for the type of file anyhow. The c 'library' gzip, ie gzopen/gzread/etc will transparently read uncompressed files.


1 Answers

You need to seek to the beginning of compressedFile after writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzip module and will appear as an empty file to it. See below:

#! /usr/bin/env python import urllib2 import StringIO import gzip  baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/" filename = "man-pages-3.34.tar.gz" outFilePath = "man-pages-3.34.tar"  response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) # # Set the file's current position to the beginning # of the file so that gzip.GzipFile can read # its contents from the top. # compressedFile.seek(0)  decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')  with open(outFilePath, 'w') as outfile:     outfile.write(decompressedFile.read()) 
like image 88
crayzeewulf Avatar answered Oct 11 '22 04:10

crayzeewulf