UnicodeDecodeError when using Python 2.x unicodecsv

Tags:

I'm trying to write out a csv file with Unicode characters, so I'm using the unicodecsv package. Unfortunately, I'm still getting UnicodeDecodeErrors:

# -*- coding: utf-8 -*-

import codecs
import unicodecsv

raw_contents = 'He observes an “Oversized Gorilla” near Ashford'
encoded_contents = unicode(raw_contents, errors='replace')

with codecs.open('test.csv', 'w', 'UTF-8') as f:
    w = unicodecsv.writer(f, encoding='UTF-8')
    w.writerow(["1", encoded_contents])

This is the traceback:

Traceback (most recent call last):
  File "unicode_test.py", line 11, in <module>
    w.writerow(["1", encoded_contents])
  File "/Library/Python/2.7/site-packages/unicodecsv/__init__.py", line 83, in writerow
    self.writer.writerow(_stringify_list(row, self.encoding, self.encoding_errors))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 17: ordinal not in range(128)

I thought converting it to Unicode would be good enough, but that doesn't seem to be the case. I'd really like to understand what is happening so that I'm better prepared to handle these errors in other projects in the future.

From the traceback, it looks like I can reproduce the error like this:

>>> raw_contents = 'He observes an “Oversized Gorilla” near Ashford'
>>> raw_contents.encode('UTF-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 15: ordinal not in range(128)
>>>

Up until now, I thought I had a decent working knowledge of working with Unicode text in Python 2.x, but this has humbled me.

960

asked Jul 31 '14 15:07

Scott

1 Answers

You should not use codecs.open() for your file. unicodecsv wraps the csv module, which always writes a byte string to the open file object. In order to write that byte string to a Unicode-aware file object such as returned by codecs.open(), it is implicitly decoded; this is where your UnicodeDecodeError exception stems from.

Use a file in binary mode instead:

with open('test.csv', 'wb') as f:
    w = unicodecsv.writer(f, encoding='UTF-8')
    w.writerow(["1", encoded_contents])

The binary mode is not strictly necessary unless your data contains embedded newlines, but the csv module wants to control how newlines are written to ensure that such values are handled correctly. However, not using codecs.open() is an absolute requirement.

The same thing happens when you call .encode() on a byte string; you already have encoded data there, so Python implicitly decodes to get a Unicode value to encode.

172

answered Oct 23 '22 00:10

Martijn Pieters

Related questions
                            
                                Solving system of nonlinear equations with python
                            
                                How can check the distribution of a variable in python? [closed]
                            
                                Is there a limit on TextBlob translation?
                            
                                A Complete Many-to-One Example Using Flask, WTForm, SQLAlchemy, and Jinja2
                            
                                Python recursion RuntimeError
                            
                                How to ignore NaN in colorbar?
                            
                                spyder matplotlib UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen
                            
                                Error when running Python parameterized test method
                            
                                How can I remove specific instructions from kivy widget canvas?
                            
                                Python Command Line Checkboxes
                            
                                Running multiple uwsgi python versions
                            
                                How do I list files in Asyncio? [closed]
                            
                                How to escape dot in python str.format
                            
                                Can a Java GUI control a Python backend?
                            
                                OpenCV Assertion Failed error: (-215) scn == 3 || scn == 4 in function cv::cvtColor works ALTERNATE times
                            
                                How to reflect database objects using Pony ORM?
                            
                                Celery First Steps - timeout error on result.get()
                            
                                How to share sessions between modules on a Google App Engine Python application?
                            
                                Python iterator is empty after performing some action on it
                            
                                Using pandas to read downloaded html file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

UnicodeDecodeError when using Python 2.x unicodecsv

Tags:

python

unicode

python-unicode

Scott

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us