I need to read selected files, matching on the file name, from a remote zip archive using Python. I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory). I've already written the code and it works, and I'm answering this myself so I can search for it later. But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement.

This will do the job without downloading the entire zip file! http://pypi.python.org/pypi/pyremotezip

How do I read selected files from a remote Zip archive over HTTP using Python?

Tags:

I need to read selected files, matching on the file name, from a remote zip archive using Python. I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory).

I've already written the code and it works, and I'm answering this myself so I can search for it later. But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement.

780

asked Sep 18 '08 17:09

Marcel Levy

3 Answers

Here's how I did it (grabbing all files ending in ".ranks"):

import urllib2, cStringIO, zipfile

try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib2.HTTPError:
    # handle exception

answered Nov 07 '22 00:11

Marcel Levy

Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! Just as an update: For Python 3.0, your code needs to be modified slightly:

import urllib.request, io, zipfile

try:
    remotezip = urllib.request.urlopen(url)
    zipinmemory = io.BytesIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib.request.HTTPError:
    # handle exception

answered Nov 07 '22 00:11

Tim Pietzcker

This will do the job without downloading the entire zip file!

http://pypi.python.org/pypi/pyremotezip

answered Nov 07 '22 00:11

Filipe Varela

Related questions
                            
                                Django compress error: Invalid input of type: 'CacheKey'
                            
                                Why is ‘==‘ coming before ‘in’ in Python?
                            
                                Replace ones in binary columns with values from another column
                            
                                What does `exit` keyword do in Python3 with Jupyter Notebook?
                            
                                Use columns 1 and 2 to populate column 3
                            
                                Numpy, TypeError: Could not be cast from dtype('<M8[us]') to dtype('<M8[D]')
                            
                                How to apply float precision (type specifier) in a conditional f-string?
                            
                                pylint R1720: Unnecessary "elif" after "raise" (no-else-raise)
                            
                                nan values in loss in keras model
                            
                                Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)
                            
                                How to get PyPI to automatically install dependencies [duplicate]
                            
                                How to count the number of occurences before a particular value in dataframe python?
                            
                                What does "import" prefer - .pyd (.so) or .py?
                            
                                Transform multiple categorical columns
                            
                                How to set an HTTPONLY cookie in Flask
                            
                                How to display LaTeX f-strings in matplotlib [duplicate]
                            
                                Getting PEP8 "invalid escape sequence" warning trying to escape parentheses in a regex
                            
                                Python - How to use FastAPI and uvicorn.run without blocking the thread?
                            
                                How to run a Python Script from Deno?
                            
                                python3.8-venv not working with python3.8 -m venv env

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I read selected files from a remote Zip archive over HTTP using Python?

Tags:

python

http

zip

Marcel Levy

People also ask

3 Answers

Marcel Levy

Tim Pietzcker

Filipe Varela

Recent Activity

Donate For Us