Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV, DictWriter, unicode and utf-8

I am having problems with the DictWriter and non-ascii characters. A short version of my problem:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import codecs
import csv

f = codecs.open("test.csv", 'w', 'utf-8')
writer = csv.DictWriter(f, ['field1'], delimiter='\t')
writer.writerow({'field1':u'å'.encode('utf-8')})
f.close()

Gives this Traceback:

Traceback (most recent call last):
File "test.py", line 10, in <module>writer.writerow({'field1':u'å'.encode('utf-8')})
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/csv.py", line 124, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I am bit lost as the DictWriter ought to be able to work with UTF-8 from what I have read in the documentation.

like image 472
Joel Avatar asked Jul 19 '10 22:07

Joel


People also ask

What is CSV UTF8 encoding?

Saves as a comma-separated document for use on the MS-DOS operating system. CSV UTF-8 (comma delimited). It is Unicode Transformation Format 8-bit encoding that supports many special characters, including hieroglyphs and accented characters, and is backward compatible with ASCII.

What is the best encoding for CSV?

The CSV file must be saved with UTF-8 or RFC-4180 encoding for special and multi-byte characters to import correctly. You can use utilities, such as Notepad++ to save the file in UTF-8 format.

How do I convert a Unicode character to a CSV file?

Navigate to Data → Get External Data → From Text. Navigate to the location of the CSV file you want to import. Choose the Delimited option. Set the character encoding File Origin to 65001: Unicode (UTF-8) from the drop-down list.

What is the encoding of a CSV file?

In Windows, if you double-click on a CSV file, Microsoft Excel will open it using the Windows-1252 file encoding.


1 Answers

The object you obtain with codecs.open wants a unicode string in its write method -- that's the whole point. csv.DictWriter of course is calling that method with a utf8-encoded byte string instead, whence the exception.

Change f's creation to f = open("test.csv", 'wb') (taking codecs out of the picture) and things should work just fine.

like image 89
Alex Martelli Avatar answered Sep 29 '22 23:09

Alex Martelli