Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unicodecsv reader from unicode string not working?

I'm having trouble reading in a unicode CSV string into python-unicodescv:

>>> import unicodecsv, StringIO
>>> f = StringIO.StringIO(u'é,é')
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I'm guessing it's an issue with how I convert my unicode string into a StringIO file somehow? The example on the python-unicodecsv github page works fine:

>>> import unicodecsv
>>> from cStringIO import StringIO
>>> f = StringIO()
>>> w = unicodecsv.writer(f, encoding='utf-8')
>>> w.writerow((u'é', u'ñ'))
>>> f.seek(0)
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
>>> print row[0], row[1]
é ñ

Trying my code with cStringIO fails as cStringIO can't accept unicode (so why the example works, I don't know!)

>>> from cStringIO import StringIO
>>> f = StringIO(u'é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I'm need to accept a UTF-8 CSV formatted input from a web textarea form field, hence can't just read in from a file.

Any ideas?

like image 888
Guy Bowden Avatar asked Mar 21 '23 17:03

Guy Bowden


1 Answers

The unicodecsv file reads and decodes byte strings for you. You are passing it unicode strings instead. On output, your unicode values are encoded to bytestrings for you, using the configured codec.

In addition, cStringIO.StringIO can only handle encoded bytestrings, while the pure-python StringIO.StringIO class happily treats unicode values as if they are byte strings.

The solution is to encode your unicode values before putting them into the StringIO object:

>>> import unicodecsv, StringIO, cStringIO
>>> f = StringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
like image 53
Martijn Pieters Avatar answered Mar 31 '23 12:03

Martijn Pieters