I am having problems with the DictWriter and non-ascii characters. A short version of my problem:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import csv
f = codecs.open("test.csv", 'w', 'utf-8')
writer = csv.DictWriter(f, ['field1'], delimiter='\t')
writer.writerow({'field1':u'å'.encode('utf-8')})
f.close()
Gives this Traceback:
Traceback (most recent call last):
File "test.py", line 10, in <module>writer.writerow({'field1':u'å'.encode('utf-8')})
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/csv.py", line 124, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I am bit lost as the DictWriter ought to be able to work with UTF-8 from what I have read in the documentation.
Saves as a comma-separated document for use on the MS-DOS operating system. CSV UTF-8 (comma delimited). It is Unicode Transformation Format 8-bit encoding that supports many special characters, including hieroglyphs and accented characters, and is backward compatible with ASCII.
The CSV file must be saved with UTF-8 or RFC-4180 encoding for special and multi-byte characters to import correctly. You can use utilities, such as Notepad++ to save the file in UTF-8 format.
Navigate to Data → Get External Data → From Text. Navigate to the location of the CSV file you want to import. Choose the Delimited option. Set the character encoding File Origin to 65001: Unicode (UTF-8) from the drop-down list.
In Windows, if you double-click on a CSV file, Microsoft Excel will open it using the Windows-1252 file encoding.
The object you obtain with codecs.open
wants a unicode string in its write
method -- that's the whole point. csv.DictWriter
of course is calling that method with a utf8-encoded byte string instead, whence the exception.
Change f
's creation to f = open("test.csv", 'wb')
(taking codecs
out of the picture) and things should work just fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With