Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you read a file inside a zip file as text, not bytes?

A simple program for reading a CSV file inside a zip file works in Python 2.7, but not in Python 3.2

$ cat test_zip_file_py3k.py  import csv, sys, zipfile  zip_file    = zipfile.ZipFile(sys.argv[1]) items_file  = zip_file.open('items.csv', 'rU')  for row in csv.DictReader(items_file):     pass  $ python2.7 test_zip_file_py3k.py ~/data.zip  $ python3.2 test_zip_file_py3k.py ~/data.zip Traceback (most recent call last):   File "test_zip_file_py3k.py", line 8, in <module>     for row in csv.DictReader(items_file):   File "/home/msabramo/run/lib/python3.2/csv.py", line 109, in __next__     self.fieldnames   File "/home/msabramo/run/lib/python3.2/csv.py", line 96, in fieldnames     self._fieldnames = next(self.reader) _csv.Error: iterator should return strings, not bytes (did you open the file  in text mode?) 

So the csv module in Python 3 wants to see a text file, but zipfile.ZipFile.open returns a zipfile.ZipExtFile that is always treated as binary data.

How does one make this work in Python 3?

like image 578
Marc Abramowitz Avatar asked Apr 11 '11 21:04

Marc Abramowitz


People also ask

What is a zip txt file?

txt, . doc and . xls files into . zip files. Zip files are compressed data files that allow you to send, transport, e-mail and download faster [source: WinZip].

Is the zip file readable?

While it is possible to write zipped datasets using the Generic writer, it is not possible to read them using the Generic Reader with the format set to Guess format name from Extension. Reading from a password-protected zip file is not currently supported.

How do I read a zip file in pandas?

Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.


2 Answers

I just noticed that Lennart's answer didn't work with Python 3.1, but it does work with Python 3.2. They've enhanced zipfile.ZipExtFile in Python 3.2 (see release notes). These changes appear to make zipfile.ZipExtFile work nicely with io.TextWrapper.

Incidentally, it works in Python 3.1, if you uncomment the hacky lines below to monkey-patch zipfile.ZipExtFile, not that I would recommend this sort of hackery. I include it only to illustrate the essence of what was done in Python 3.2 to make things work nicely.

$ cat test_zip_file_py3k.py  import csv, io, sys, zipfile  zip_file    = zipfile.ZipFile(sys.argv[1]) items_file  = zip_file.open('items.csv', 'rU') # items_file.readable = lambda: True # items_file.writable = lambda: False # items_file.seekable = lambda: False # items_file.read1 = items_file.read items_file  = io.TextIOWrapper(items_file)  for idx, row in enumerate(csv.DictReader(items_file)):     print('Processing row {0} -- row = {1}'.format(idx, row)) 

If I had to support py3k < 3.2, then I would go with the solution in my other answer.

like image 111
Marc Abramowitz Avatar answered Sep 18 '22 05:09

Marc Abramowitz


You can wrap it in a io.TextIOWrapper.

items_file  = io.TextIOWrapper(items_file, encoding='your-encoding', newline='') 

Should work.

like image 39
Lennart Regebro Avatar answered Sep 22 '22 05:09

Lennart Regebro