I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.
I am now at a loss to achieve the next step.
My primary goal is to download and extract the zip file and pass the contents (CSV data) via a TCP stream. I would prefer not to actually write any of the zip or extracted files to disk if I could get away with it.
Here is my current script which works but unfortunately has to write the files to disk.
import urllib, urllister import zipfile import urllib2 import os import time import pickle # check for extraction directories existence if not os.path.isdir('downloaded'): os.makedirs('downloaded') if not os.path.isdir('extracted'): os.makedirs('extracted') # open logfile for downloaded data and save to local variable if os.path.isfile('downloaded.pickle'): downloadedLog = pickle.load(open('downloaded.pickle')) else: downloadedLog = {'key':'value'} # remove entries older than 5 days (to maintain speed) # path of zip files zipFileURL = "http://www.thewebserver.com/that/contains/a/directory/of/zip/files" # retrieve list of URLs from the webservers usock = urllib.urlopen(zipFileURL) parser = urllister.URLLister() parser.feed(usock.read()) usock.close() parser.close() # only parse urls for url in parser.urls: if "PUBLIC_P5MIN" in url: # download the file downloadURL = zipFileURL + url outputFilename = "downloaded/" + url # check if file already exists on disk if url in downloadedLog or os.path.isfile(outputFilename): print "Skipping " + downloadURL continue print "Downloading ",downloadURL response = urllib2.urlopen(downloadURL) zippedData = response.read() # save data to disk print "Saving to ",outputFilename output = open(outputFilename,'wb') output.write(zippedData) output.close() # extract the data zfobj = zipfile.ZipFile(outputFilename) for name in zfobj.namelist(): uncompressed = zfobj.read(name) # save uncompressed data to disk outputFilename = "extracted/" + name print "Saving extracted file to ",outputFilename output = open(outputFilename,'wb') output.write(uncompressed) output.close() # send data via tcp stream # file successfully downloaded and extracted store into local log and filesystem log downloadedLog[url] = time.time(); pickle.dump(downloadedLog, open('downloaded.pickle', "wb" ))
1) Right-click the compressed (zipped) folder. 2) Select "Extract All" from the context menu. 3) By default, the compressed files will extract in the same location as the zipped folder, but you can click the Browse button to select an alternative location. 4) Check the option "Show extracted files when complete".
Tip 1: Move the Zip File to Another Location A possible reason why you are encountering the Windows cannot complete the extraction error, is that the zip file is located in a protected place. You can fix this by moving the zip file to a different location like a different profile folder.
zip lists the contents of a ZIP archive to ensure your file is inside. Use the -p option to write the contents of named files to stdout (screen) without having to uncompress the entire archive.
Below is a code snippet I used to fetch zipped csv file, please have a look:
Python 2:
from StringIO import StringIO from zipfile import ZipFile from urllib import urlopen resp = urlopen("http://www.test.com/file.zip") zipfile = ZipFile(StringIO(resp.read())) for line in zipfile.open(file).readlines(): print line
Python 3:
from io import BytesIO from zipfile import ZipFile from urllib.request import urlopen # or: requests.get(url).content resp = urlopen("http://www.test.com/file.zip") zipfile = ZipFile(BytesIO(resp.read())) for line in zipfile.open(file).readlines(): print(line.decode('utf-8'))
Here file
is a string. To get the actual string that you want to pass, you can use zipfile.namelist()
. For instance,
resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip') zipfile = ZipFile(BytesIO(resp.read())) zipfile.namelist() # ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']
My suggestion would be to use a StringIO
object. They emulate files, but reside in memory. So you could do something like this:
# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo' import zipfile from StringIO import StringIO zipdata = StringIO() zipdata.write(get_zip_data()) myzipfile = zipfile.ZipFile(zipdata) foofile = myzipfile.open('foo.txt') print foofile.read() # output: "hey, foo"
Or more simply (apologies to Vishal):
myzipfile = zipfile.ZipFile(StringIO(get_zip_data())) for name in myzipfile.namelist(): [ ... ]
In Python 3 use BytesIO instead of StringIO:
import zipfile from io import BytesIO filebytes = BytesIO(get_zip_data()) myzipfile = zipfile.ZipFile(filebytes) for name in myzipfile.namelist(): [ ... ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With