I'm having trouble reading in a unicode CSV string into python-unicodescv:
>>> import unicodecsv, StringIO
>>> f = StringIO.StringIO(u'é,é')
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next
row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
I'm guessing it's an issue with how I convert my unicode string into a StringIO file somehow? The example on the python-unicodecsv github page works fine:
>>> import unicodecsv
>>> from cStringIO import StringIO
>>> f = StringIO()
>>> w = unicodecsv.writer(f, encoding='utf-8')
>>> w.writerow((u'é', u'ñ'))
>>> f.seek(0)
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
>>> print row[0], row[1]
é ñ
Trying my code with cStringIO fails as cStringIO can't accept unicode (so why the example works, I don't know!)
>>> from cStringIO import StringIO
>>> f = StringIO(u'é')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
I'm need to accept a UTF-8 CSV formatted input from a web textarea form field, hence can't just read in from a file.
Any ideas?
The unicodecsv
file reads and decodes byte strings for you. You are passing it unicode
strings instead. On output, your unicode values are encoded to bytestrings for you, using the configured codec.
In addition, cStringIO.StringIO
can only handle encoded bytestrings, while the pure-python StringIO.StringIO
class happily treats unicode
values as if they are byte strings.
The solution is to encode your unicode values before putting them into the StringIO
object:
>>> import unicodecsv, StringIO, cStringIO
>>> f = StringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With