Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open file from zip without extracting it in Python?

Tags:

python

zip

I am working on a script that fetches a zip file from a URL using tje request library. That zip file contains a csv file. I'm trying to read that csv file without saving it. But while parsing it's giving me this error: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

import csv
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile

response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
    csvreader = csv.reader(csvfile)

    # _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

    for row in csvreader:
        print(row)
like image 476
Magnotta Avatar asked Mar 28 '18 11:03

Magnotta


2 Answers

Try this:

import pandas as pd
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile

response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:   
    print(pd.read_csv(csvfile, encoding='utf8', sep=","))
like image 178
nikhalster Avatar answered Oct 25 '22 10:10

nikhalster


As @Aran-Fey alluded to:

import zipfile
import csv
import io

with open('/path/to/archive.zip', 'r') as f:
    with zipfile.ZipFile(f) as zf:
        csv_filename = zf.namelist()[0]  # see namelist() for the list of files in the archive
        with zf.open(csv_filename) as csv_f:
            csv_f_as_text = io.TextIOWrapper(csv_f)
            reader = csv.reader(csv_f_as_text)

csv.reader (and csv.DictReader) require a file-like object opened in text mode. Normally this is not a problem when simply open(...)ing file in 'r' mode, as the Python 3 docs say, text mode is the default: "The default mode is 'r' (open for reading text, synonym of 'rt')". But if you try rt with open on a ZipFile, you'll see an error that: ZipFile.open() requires mode "r" or "w":

        with zf.open(csv_filename, 'rt') as csv_f:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...    
ValueError: open() requires mode "r" or "w"

That's what io.TextIOWrapper is for -- for wrapping byte streams to be readable as text, decoding them on the fly.

like image 45
grisaitis Avatar answered Oct 25 '22 09:10

grisaitis