Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read selected files from a remote Zip archive over HTTP using Python?

Tags:

python

http

zip

I need to read selected files, matching on the file name, from a remote zip archive using Python. I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory).

I've already written the code and it works, and I'm answering this myself so I can search for it later. But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement.

like image 780
Marcel Levy Avatar asked Sep 18 '08 17:09

Marcel Levy


People also ask

How do I read a zip folder in Python?

zip" # opening the zip file in READ mode with ZipFile(file_name, 'r') as zip: # printing all the contents of the zip file zip. printdir() # extracting all the files print('Extracting all the files now...') zip. extractall() print('Done! ')

How do I extract a zip file in Python?

Python3. # into a specific location. Import the zipfile module Create a zip file object using ZipFile class. Call the extract() method on the zip file object and pass the name of the file to be extracted and the path where the file needed to be extracted and Extracting the specific file present in the zip.

How do I download a zip file from a website using Python?

If you just want to save the file from the url you can do: urllib. request. urlretrieve(url, filename) .

Can Python access ZIP files?

Python can work directly with data in ZIP files. You can look at the list of items in the directory and work with the data files themselves.


3 Answers

Here's how I did it (grabbing all files ending in ".ranks"):

import urllib2, cStringIO, zipfile

try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib2.HTTPError:
    # handle exception
like image 94
Marcel Levy Avatar answered Nov 07 '22 00:11

Marcel Levy


Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! Just as an update: For Python 3.0, your code needs to be modified slightly:

import urllib.request, io, zipfile

try:
    remotezip = urllib.request.urlopen(url)
    zipinmemory = io.BytesIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib.request.HTTPError:
    # handle exception
like image 33
Tim Pietzcker Avatar answered Nov 07 '22 00:11

Tim Pietzcker


This will do the job without downloading the entire zip file!

http://pypi.python.org/pypi/pyremotezip

like image 25
Filipe Varela Avatar answered Nov 07 '22 00:11

Filipe Varela