Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download and unzip file with Python

I am trying to download and open a zipped file and seem to be having trouble using a file type handle with zipfile. I'm getting the error "AttributeError: addinfourl instance has no attribute 'seek'" when running this:

import zipfile
import urllib2

def download(url,directory,name):
 webfile = urllib2.urlopen('http://www.sec.gov'+url)
 webfile2 = zipfile.ZipFile(webfile)
 content = zipfile.ZipFile.open(webfile2).read()
 localfile = open(directory+name, 'w')
 localfile.write(content)
 localfile.close()
 return()

download(link.get("href"),'./fails_data', link.text)
like image 400
erin Avatar asked Jul 28 '11 15:07

erin


People also ask

How do I unzip and download a file in Python?

Python3. # into a specific location. Import the zipfile module Create a zip file object using ZipFile class. Call the extract() method on the zip file object and pass the name of the file to be extracted and the path where the file needed to be extracted and Extracting the specific file present in the zip.

Can Python unzip files?

In Python, you can zip and unzip files, i.e., compress files into a ZIP file and extract a ZIP file with the zipfile module.

How do I download a ZIP file from Google Drive to Python?

Download-Large-File-From-Google-Drive-Using-PythonGet the file ID of your file on Google Drive (i.e. from the sharable link) Paste the file ID in file_id. Specify the full path of where you want to save the downloaded file. call the function download_file_from_google_drive(file_id, destination)


3 Answers

Putting things together, the following retrieves the content of the first file within a zipped file from a website:

import urllib.request
import zipfile
    
url = 'http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip'
filehandle, _ = urllib.request.urlretrieve(url)
zip_file_object = zipfile.ZipFile(filehandle, 'r')
first_file = zip_file_object.namelist()[0]
file = zip_file_object.open(first_file)
content = file.read()
like image 105
Marius Avatar answered Oct 10 '22 21:10

Marius


You can't seek on a urllib2.urlopened file. The methods it supports are listed here: http://docs.python.org/library/urllib.html#urllib.urlopen.

You'll have to retrieve the file (possibly with urllib.urlretrieve, http://docs.python.org/library/urllib.html#urllib.urlretrieve), then use zipfile on it.

Alternatively, you could read() the urlopened file, then put it into a StringIO, then use zipfile on that, if you wanted the zipped data in memory. Also check out the extract and extract_all methods of zipfile if you just want to extract the file, instead of using read.

like image 30
agf Avatar answered Oct 10 '22 23:10

agf


As of 2020, you can use dload to download and unzip a file, i.e.:

import dload
dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip")

By default it extracts to a dir on the script path with the zip file name, but you can specify the extract location:

dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip", "/extract/here")

install using pip install dload

like image 24
Pedro Lobito Avatar answered Oct 10 '22 23:10

Pedro Lobito