Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading and unzipping a .zip file without writing to disk

Tags:

python

unzip

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.

I am now at a loss to achieve the next step.

My primary goal is to download and extract the zip file and pass the contents (CSV data) via a TCP stream. I would prefer not to actually write any of the zip or extracted files to disk if I could get away with it.

Here is my current script which works but unfortunately has to write the files to disk.

import urllib, urllister import zipfile import urllib2 import os import time import pickle  # check for extraction directories existence if not os.path.isdir('downloaded'):     os.makedirs('downloaded')  if not os.path.isdir('extracted'):     os.makedirs('extracted')  # open logfile for downloaded data and save to local variable if os.path.isfile('downloaded.pickle'):     downloadedLog = pickle.load(open('downloaded.pickle')) else:     downloadedLog = {'key':'value'}  # remove entries older than 5 days (to maintain speed)  # path of zip files zipFileURL = "http://www.thewebserver.com/that/contains/a/directory/of/zip/files"  # retrieve list of URLs from the webservers usock = urllib.urlopen(zipFileURL) parser = urllister.URLLister() parser.feed(usock.read()) usock.close() parser.close()  # only parse urls for url in parser.urls:      if "PUBLIC_P5MIN" in url:          # download the file         downloadURL = zipFileURL + url         outputFilename = "downloaded/" + url          # check if file already exists on disk         if url in downloadedLog or os.path.isfile(outputFilename):             print "Skipping " + downloadURL             continue          print "Downloading ",downloadURL         response = urllib2.urlopen(downloadURL)         zippedData = response.read()          # save data to disk         print "Saving to ",outputFilename         output = open(outputFilename,'wb')         output.write(zippedData)         output.close()          # extract the data         zfobj = zipfile.ZipFile(outputFilename)         for name in zfobj.namelist():             uncompressed = zfobj.read(name)              # save uncompressed data to disk             outputFilename = "extracted/" + name             print "Saving extracted file to ",outputFilename             output = open(outputFilename,'wb')             output.write(uncompressed)             output.close()              # send data via tcp stream              # file successfully downloaded and extracted store into local log and filesystem log             downloadedLog[url] = time.time();             pickle.dump(downloadedLog, open('downloaded.pickle', "wb" )) 
like image 690
user714415 Avatar asked Apr 19 '11 02:04

user714415


People also ask

Can we extract zip in Drive?

1) Right-click the compressed (zipped) folder. 2) Select "Extract All" from the context menu. 3) By default, the compressed files will extract in the same location as the zipped folder, but you can click the Browse button to select an alternative location. 4) Check the option "Show extracted files when complete".

Why can I not extract a zip file?

Tip 1: Move the Zip File to Another Location A possible reason why you are encountering the Windows cannot complete the extraction error, is that the zip file is located in a protected place. You can fix this by moving the zip file to a different location like a different profile folder.

How do I open a zip file without extracting it?

zip lists the contents of a ZIP archive to ensure your file is inside. Use the -p option to write the contents of named files to stdout (screen) without having to uncompress the entire archive.


2 Answers

Below is a code snippet I used to fetch zipped csv file, please have a look:

Python 2:

from StringIO import StringIO from zipfile import ZipFile from urllib import urlopen  resp = urlopen("http://www.test.com/file.zip") zipfile = ZipFile(StringIO(resp.read())) for line in zipfile.open(file).readlines():     print line 

Python 3:

from io import BytesIO from zipfile import ZipFile from urllib.request import urlopen # or: requests.get(url).content  resp = urlopen("http://www.test.com/file.zip") zipfile = ZipFile(BytesIO(resp.read())) for line in zipfile.open(file).readlines():     print(line.decode('utf-8')) 

Here file is a string. To get the actual string that you want to pass, you can use zipfile.namelist(). For instance,

resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip') zipfile = ZipFile(BytesIO(resp.read())) zipfile.namelist() # ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms'] 
like image 191
Vishal Avatar answered Sep 22 '22 23:09

Vishal


My suggestion would be to use a StringIO object. They emulate files, but reside in memory. So you could do something like this:

# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'  import zipfile from StringIO import StringIO  zipdata = StringIO() zipdata.write(get_zip_data()) myzipfile = zipfile.ZipFile(zipdata) foofile = myzipfile.open('foo.txt') print foofile.read()  # output: "hey, foo" 

Or more simply (apologies to Vishal):

myzipfile = zipfile.ZipFile(StringIO(get_zip_data())) for name in myzipfile.namelist():     [ ... ] 

In Python 3 use BytesIO instead of StringIO:

import zipfile from io import BytesIO  filebytes = BytesIO(get_zip_data()) myzipfile = zipfile.ZipFile(filebytes) for name in myzipfile.namelist():     [ ... ] 
like image 23
senderle Avatar answered Sep 24 '22 23:09

senderle