I am working on a script that runs on App Engine, so I have RAM limits I need to adhere to (specific to app engine that limit is 1024 MB).
I am downloading a large archive, from which I need to extract the file list. The archive itself is just a file I am storing for later off-line use (if needed), but I need the file list as I am searching for changes in the zip archive each time I pull it.
Here is the code block I have now:
url = 'http://url.to/archive.zip'
r = requests.get(url)
file_mem = StringIO.StringIO(r.content)
zip_file = zipfile.ZipFile(file_mem, 'r')
# get the list of files
file_list = zip_file.namelist() # list of files -- stored in memory
With the StringIO
object, it's placing the entire archive into memory. Is there a way I can go from my r.content
object to a file list without placing the entire file into memory at once?
Well, how about downloading the file to disk and then using the zipfile module to parse it there -- should save you from having to keep the .zip contents all in memory, and hopefully work fine on a small App Engine instance.
import zipfile
import urllib
url = 'http://url.to/archive.zip'
urllib.urlretrieve(url, 'archive.zip')
with zipfile.ZipFile('archive.zip', 'r') as myzip:
print myzip.namelist()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With