Python DictWriter writing UTF-8 encoded CSV files

Tags:

I have a list of dictionaries containing unicode strings.
csv.DictWriter can write a list of dictionaries into a CSV file.
I want the CSV file to be encoded in UTF8.
The csv module cannot handle converting unicode strings into UTF8.

The csv module documentation has an example for converting everything to UTF8:

def utf_8_encoder(unicode_csv_data):     for line in unicode_csv_data:         yield line.encode('utf-8')

It also has a UnicodeWriter class.

But... how do I make DictWriter work with these? Wouldn't they have to inject themselves in the middle of it, to catch the disassembled dictionaries and encode them before it writes them to the file? I don't get it.

428

asked Apr 30 '11 00:04

endolith

1 Answers

UPDATE: The 3rd party unicodecsv module implements this 7-year old answer for you. Example below this code. There's also a Python 3 solution that doesn't required a 3rd party module.

Original Python 2 Answer

If using Python 2.7 or later, use a dict comprehension to remap the dictionary to utf-8 before passing to DictWriter:

# coding: utf-8 import csv D = {'name':u'马克','pinyin':u'mǎkè'} f = open('out.csv','wb') f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) w = csv.DictWriter(f,sorted(D.keys())) w.writeheader() w.writerow({k:v.encode('utf8') for k,v in D.items()}) f.close()

You can use this idea to update UnicodeWriter to DictUnicodeWriter:

# coding: utf-8 import csv import cStringIO import codecs  class DictUnicodeWriter(object):      def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):         # Redirect output to a queue         self.queue = cStringIO.StringIO()         self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)         self.stream = f         self.encoder = codecs.getincrementalencoder(encoding)()      def writerow(self, D):         self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()})         # Fetch UTF-8 output from the queue ...         data = self.queue.getvalue()         data = data.decode("utf-8")         # ... and reencode it into the target encoding         data = self.encoder.encode(data)         # write to the target stream         self.stream.write(data)         # empty queue         self.queue.truncate(0)      def writerows(self, rows):         for D in rows:             self.writerow(D)      def writeheader(self):         self.writer.writeheader()  D1 = {'name':u'马克','pinyin':u'Mǎkè'} D2 = {'name':u'美国','pinyin':u'Měiguó'} f = open('out.csv','wb') f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) w = DictUnicodeWriter(f,sorted(D.keys())) w.writeheader() w.writerows([D1,D2]) f.close()

Python 2 unicodecsv Example:

# coding: utf-8 import unicodecsv as csv  D = {u'name':u'马克',u'pinyin':u'mǎkè'}  with open('out.csv','wb') as f:     w = csv.DictWriter(f,fieldnames=sorted(D.keys()),encoding='utf-8-sig')     w.writeheader()     w.writerow(D)

Python 3:

Additionally, Python 3's built-in csv module supports Unicode natively:

# coding: utf-8 import csv  D = {u'name':u'马克',u'pinyin':u'mǎkè'}  # Use newline='' instead of 'wb' in Python 3. with open('out.csv','w',encoding='utf-8-sig',newline='') as f:     w = csv.DictWriter(f,fieldnames=sorted(D.keys()))     w.writeheader()     w.writerow(D)

159

answered Sep 27 '22 21:09

Mark Tolonen

Related questions
                            
                                Run multiple python scripts concurrently
                            
                                Determine whether a key is present in a dictionary [duplicate]
                            
                                Time difference in seconds from numpy.timedelta64
                            
                                Expanding English language contractions in Python
                            
                                Matplotlib Plot Lines with Colors Through Colormap
                            
                                Fillna in multiple columns in place in Python Pandas
                            
                                How do I automatically install missing python modules? [duplicate]
                            
                                Initialize list with same bool value
                            
                                Escaping chars in Python and sqlite
                            
                                Efficient way to unnest (explode) multiple list columns in a pandas DataFrame
                            
                                2d array of zeros
                            
                                Second y-axis time series seaborn
                            
                                how to do bitwise exclusive or of two strings in python?
                            
                                A system independent way using python to get the root directory/drive on which python is installed
                            
                                dropping trailing '.0' from floats
                            
                                How do you determine a processing time in Python?
                            
                                About char b prefix in Python3.4.1 client connect to redis
                            
                                How to get feature importance in xgboost?
                            
                                Django url pattern - string parameter
                            
                                pip3 error - '_NamespacePath' object has no attribute 'sort'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python DictWriter writing UTF-8 encoded CSV files

Tags:

python

csv

unicode

utf-8

endolith

People also ask

1 Answers

Mark Tolonen

Recent Activity

Donate For Us