Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Way to get a file list from a downloaded zip without loading entire zip file into memory?

Tags:

python

zip

I am working on a script that runs on App Engine, so I have RAM limits I need to adhere to (specific to app engine that limit is 1024 MB).

I am downloading a large archive, from which I need to extract the file list. The archive itself is just a file I am storing for later off-line use (if needed), but I need the file list as I am searching for changes in the zip archive each time I pull it.

Here is the code block I have now:

url = 'http://url.to/archive.zip'
r = requests.get(url)
file_mem = StringIO.StringIO(r.content)
zip_file = zipfile.ZipFile(file_mem, 'r')

# get the list of files
file_list = zip_file.namelist() # list of files -- stored in memory

With the StringIO object, it's placing the entire archive into memory. Is there a way I can go from my r.content object to a file list without placing the entire file into memory at once?

like image 673
user3058197 Avatar asked Nov 10 '22 15:11

user3058197


1 Answers

Well, how about downloading the file to disk and then using the zipfile module to parse it there -- should save you from having to keep the .zip contents all in memory, and hopefully work fine on a small App Engine instance.

import zipfile
import urllib

url = 'http://url.to/archive.zip'
urllib.urlretrieve(url, 'archive.zip')

with zipfile.ZipFile('archive.zip', 'r') as myzip:
    print myzip.namelist()
like image 80
Josh Kupershmidt Avatar answered Nov 14 '22 22:11

Josh Kupershmidt