Downloading and unzipping a .zip file without writing to disk

Tags:

unzip

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.

I am now at a loss to achieve the next step.

My primary goal is to download and extract the zip file and pass the contents (CSV data) via a TCP stream. I would prefer not to actually write any of the zip or extracted files to disk if I could get away with it.

Here is my current script which works but unfortunately has to write the files to disk.

import urllib, urllister import zipfile import urllib2 import os import time import pickle  # check for extraction directories existence if not os.path.isdir('downloaded'):     os.makedirs('downloaded')  if not os.path.isdir('extracted'):     os.makedirs('extracted')  # open logfile for downloaded data and save to local variable if os.path.isfile('downloaded.pickle'):     downloadedLog = pickle.load(open('downloaded.pickle')) else:     downloadedLog = {'key':'value'}  # remove entries older than 5 days (to maintain speed)  # path of zip files zipFileURL = "http://www.thewebserver.com/that/contains/a/directory/of/zip/files"  # retrieve list of URLs from the webservers usock = urllib.urlopen(zipFileURL) parser = urllister.URLLister() parser.feed(usock.read()) usock.close() parser.close()  # only parse urls for url in parser.urls:      if "PUBLIC_P5MIN" in url:          # download the file         downloadURL = zipFileURL + url         outputFilename = "downloaded/" + url          # check if file already exists on disk         if url in downloadedLog or os.path.isfile(outputFilename):             print "Skipping " + downloadURL             continue          print "Downloading ",downloadURL         response = urllib2.urlopen(downloadURL)         zippedData = response.read()          # save data to disk         print "Saving to ",outputFilename         output = open(outputFilename,'wb')         output.write(zippedData)         output.close()          # extract the data         zfobj = zipfile.ZipFile(outputFilename)         for name in zfobj.namelist():             uncompressed = zfobj.read(name)              # save uncompressed data to disk             outputFilename = "extracted/" + name             print "Saving extracted file to ",outputFilename             output = open(outputFilename,'wb')             output.write(uncompressed)             output.close()              # send data via tcp stream              # file successfully downloaded and extracted store into local log and filesystem log             downloadedLog[url] = time.time();             pickle.dump(downloadedLog, open('downloaded.pickle', "wb" ))

690

asked Apr 19 '11 02:04

user714415

2 Answers

Below is a code snippet I used to fetch zipped csv file, please have a look:

Python 2:

from StringIO import StringIO from zipfile import ZipFile from urllib import urlopen  resp = urlopen("http://www.test.com/file.zip") zipfile = ZipFile(StringIO(resp.read())) for line in zipfile.open(file).readlines():     print line

Python 3:

from io import BytesIO from zipfile import ZipFile from urllib.request import urlopen # or: requests.get(url).content  resp = urlopen("http://www.test.com/file.zip") zipfile = ZipFile(BytesIO(resp.read())) for line in zipfile.open(file).readlines():     print(line.decode('utf-8'))

Here file is a string. To get the actual string that you want to pass, you can use zipfile.namelist(). For instance,

resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip') zipfile = ZipFile(BytesIO(resp.read())) zipfile.namelist() # ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']

191

answered Sep 22 '22 23:09

Vishal

My suggestion would be to use a StringIO object. They emulate files, but reside in memory. So you could do something like this:

# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'  import zipfile from StringIO import StringIO  zipdata = StringIO() zipdata.write(get_zip_data()) myzipfile = zipfile.ZipFile(zipdata) foofile = myzipfile.open('foo.txt') print foofile.read()  # output: "hey, foo"

Or more simply (apologies to Vishal):

myzipfile = zipfile.ZipFile(StringIO(get_zip_data())) for name in myzipfile.namelist():     [ ... ]

In Python 3 use BytesIO instead of StringIO:

import zipfile from io import BytesIO  filebytes = BytesIO(get_zip_data()) myzipfile = zipfile.ZipFile(filebytes) for name in myzipfile.namelist():     [ ... ]

answered Sep 24 '22 23:09

senderle

Related questions
                            
                                In Python try until no error
                            
                                How do I get the current IPython / Jupyter Notebook name
                            
                                How do I check whether this user is anonymous or actually a user on my system?
                            
                                How to give a pandas/matplotlib bar graph custom colors
                            
                                Can't get Python to import from a different folder
                            
                                Specifying a mySQL ENUM in a Django model
                            
                                Running javascript in Selenium using Python
                            
                                How to free disk space taken up by (ana)conda?
                            
                                Python: Platform independent way to modify PATH environment variable
                            
                                How to document Python code using Doxygen [closed]
                            
                                Print to the same line and not a new line?
                            
                                Split Python Flask app into multiple files
                            
                                Django - No such table: main.auth_user__old
                            
                                How does IPython's magic %paste work?
                            
                                is there a pythonic way to try something up to a maximum number of times? [duplicate]
                            
                                How to write UTF-8 in a CSV file
                            
                                Determining if root logger is set to DEBUG level in Python?
                            
                                drop into python interpreter while executing function
                            
                                Extract images from PDF without resampling, in python?
                            
                                How to draw a rectangle around a region of interest in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With