I need to read selected files, matching on the file name, from a remote zip archive using Python. I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory).
I've already written the code and it works, and I'm answering this myself so I can search for it later. But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement.
zip" # opening the zip file in READ mode with ZipFile(file_name, 'r') as zip: # printing all the contents of the zip file zip. printdir() # extracting all the files print('Extracting all the files now...') zip. extractall() print('Done! ')
Python3. # into a specific location. Import the zipfile module Create a zip file object using ZipFile class. Call the extract() method on the zip file object and pass the name of the file to be extracted and the path where the file needed to be extracted and Extracting the specific file present in the zip.
If you just want to save the file from the url you can do: urllib. request. urlretrieve(url, filename) .
Python can work directly with data in ZIP files. You can look at the list of items in the directory and work with the data files themselves.
Here's how I did it (grabbing all files ending in ".ranks"):
import urllib2, cStringIO, zipfile
try:
remotezip = urllib2.urlopen(url)
zipinmemory = cStringIO.StringIO(remotezip.read())
zip = zipfile.ZipFile(zipinmemory)
for fn in zip.namelist():
if fn.endswith(".ranks"):
ranks_data = zip.read(fn)
for line in ranks_data.split("\n"):
# do something with each line
except urllib2.HTTPError:
# handle exception
Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! Just as an update: For Python 3.0, your code needs to be modified slightly:
import urllib.request, io, zipfile
try:
remotezip = urllib.request.urlopen(url)
zipinmemory = io.BytesIO(remotezip.read())
zip = zipfile.ZipFile(zipinmemory)
for fn in zip.namelist():
if fn.endswith(".ranks"):
ranks_data = zip.read(fn)
for line in ranks_data.split("\n"):
# do something with each line
except urllib.request.HTTPError:
# handle exception
This will do the job without downloading the entire zip file!
http://pypi.python.org/pypi/pyremotezip
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With