I would like to download a file using urllib and decompress the file in memory before saving.
This is what I have right now:
response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') outfile = open(outFilePath, 'w') outfile.write(decompressedFile.read())
This ends up writing empty files. How can I achieve what I'm after?
Updated Answer:
#! /usr/bin/env python2 import urllib2 import StringIO import gzip baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/" # check filename: it may change over time, due to new updates filename = "man-pages-5.00.tar.gz" outFilePath = filename[:-3] response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile) with open(outFilePath, 'w') as outfile: outfile.write(decompressedFile.read())
gzip compressed files often have the . gz file extension (in fact, I don't think I've ever seen a . gzip extension), but it's generally unsafe to rely on file extension to test for the type of file anyhow. The c 'library' gzip, ie gzopen/gzread/etc will transparently read uncompressed files.
You need to seek to the beginning of compressedFile
after writing to it but before passing it to gzip.GzipFile()
. Otherwise it will be read from the end by gzip
module and will appear as an empty file to it. See below:
#! /usr/bin/env python import urllib2 import StringIO import gzip baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/" filename = "man-pages-3.34.tar.gz" outFilePath = "man-pages-3.34.tar" response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) # # Set the file's current position to the beginning # of the file so that gzip.GzipFile can read # its contents from the top. # compressedFile.seek(0) decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') with open(outFilePath, 'w') as outfile: outfile.write(decompressedFile.read())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With