I'm trying to use the csv module to read a utf-8 csv file, and I have some trouble to create a generic code for python 2 and 3 due to encoding.
Here is the original code in Python 2.7:
with open(filename, 'rb') as csvfile:
csv_reader = csv.reader(csvfile, quotechar='\"')
langs = next(csv_reader)[1:]
for row in csv_reader:
pass
But when I run it with python 3, it doesn't like the fact that I open the file without "encoding". I tried this:
with codecs.open(filename, 'r', encoding='utf-8') as csvfile:
csv_reader = csv.reader(csvfile, quotechar='\"')
langs = next(csv_reader)[1:]
for row in csv_reader:
pass
Now python 2 can't decode the line in the "for" loop. So... how should I do it ?
Indeed, in Python 2 the file should be opened in binary mode, but in Python 3 in text mode. Also in Python 3 newline=''
should be specified (which you forgot).
You'll have to do the file opening in an if-block.
import sys
if sys.version_info[0] < 3:
infile = open(filename, 'rb')
else:
infile = open(filename, 'r', newline='', encoding='utf8')
with infile as csvfile:
...
Update: While the code in my original answer works I meanwhile release a small package at https://pypi.python.org/pypi/csv342 that provides a Python 3 like interface for Python 2. So independent of your Python version you can simply do an
import csv342 as csv
import io
with io.open('some.csv', 'r', encoding='utf-8', newline='') as csv_file:
for row in csv.reader(csv_file, delimiter='|'):
print(row)
Original answer: Here's a solution that even with Python 2 actually decodes the text to Unicode strings and consequently works with encodings other than UTF-8.
The code below defines a function csv_rows()
that returns the contents of a file as sequence of lists. Example usage:
for row in csv_rows('some.csv', encoding='iso-8859-15', delimiter='|'):
print(row)
Here are the two variants for csv_rows()
: one for Python 3+ and another for Python 2.6+. During runtime it automatically picks the proper variant. UTF8Recoder
and UnicodeReader
are verbatim copies of the examples in the Python 2.7 library documentation.
import csv
import io
import sys
if sys.version_info[0] >= 3:
# Python 3 variant.
def csv_rows(csv_path, encoding, **keywords):
with io.open(csv_path, 'r', newline='', encoding=encoding) as csv_file:
for row in csv.reader(csv_file, **keywords):
yield row
else:
# Python 2 variant.
import codecs
class UTF8Recoder:
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def next(self):
return self.reader.next().encode("utf-8")
class UnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
def next(self):
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def __iter__(self):
return self
def csv_rows(csv_path, encoding, **kwds):
with io.open(csv_path, 'rb') as csv_file:
for row in UnicodeReader(csv_file, encoding=encoding, **kwds):
yield row
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With