Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python DictWriter writing UTF-8 encoded CSV files

  1. I have a list of dictionaries containing unicode strings.
  2. csv.DictWriter can write a list of dictionaries into a CSV file.
  3. I want the CSV file to be encoded in UTF8.
  4. The csv module cannot handle converting unicode strings into UTF8.
  5. The csv module documentation has an example for converting everything to UTF8:

    def utf_8_encoder(unicode_csv_data):     for line in unicode_csv_data:         yield line.encode('utf-8') 
  6. It also has a UnicodeWriter class.

But... how do I make DictWriter work with these? Wouldn't they have to inject themselves in the middle of it, to catch the disassembled dictionaries and encode them before it writes them to the file? I don't get it.

like image 428
endolith Avatar asked Apr 30 '11 00:04

endolith


People also ask

How do I open a UTF-8 CSV file in Python?

encode('utf-8') filename = 'output. csv' reader = unicode_csv_reader(open(filename)) try: products = [] for field1, field2, field3 in reader: ...


1 Answers

UPDATE: The 3rd party unicodecsv module implements this 7-year old answer for you. Example below this code. There's also a Python 3 solution that doesn't required a 3rd party module.

Original Python 2 Answer

If using Python 2.7 or later, use a dict comprehension to remap the dictionary to utf-8 before passing to DictWriter:

# coding: utf-8 import csv D = {'name':u'马克','pinyin':u'mǎkè'} f = open('out.csv','wb') f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) w = csv.DictWriter(f,sorted(D.keys())) w.writeheader() w.writerow({k:v.encode('utf8') for k,v in D.items()}) f.close() 

You can use this idea to update UnicodeWriter to DictUnicodeWriter:

# coding: utf-8 import csv import cStringIO import codecs  class DictUnicodeWriter(object):      def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):         # Redirect output to a queue         self.queue = cStringIO.StringIO()         self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)         self.stream = f         self.encoder = codecs.getincrementalencoder(encoding)()      def writerow(self, D):         self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()})         # Fetch UTF-8 output from the queue ...         data = self.queue.getvalue()         data = data.decode("utf-8")         # ... and reencode it into the target encoding         data = self.encoder.encode(data)         # write to the target stream         self.stream.write(data)         # empty queue         self.queue.truncate(0)      def writerows(self, rows):         for D in rows:             self.writerow(D)      def writeheader(self):         self.writer.writeheader()  D1 = {'name':u'马克','pinyin':u'Mǎkè'} D2 = {'name':u'美国','pinyin':u'Měiguó'} f = open('out.csv','wb') f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) w = DictUnicodeWriter(f,sorted(D.keys())) w.writeheader() w.writerows([D1,D2]) f.close() 

Python 2 unicodecsv Example:

# coding: utf-8 import unicodecsv as csv  D = {u'name':u'马克',u'pinyin':u'mǎkè'}  with open('out.csv','wb') as f:     w = csv.DictWriter(f,fieldnames=sorted(D.keys()),encoding='utf-8-sig')     w.writeheader()     w.writerow(D) 

Python 3:

Additionally, Python 3's built-in csv module supports Unicode natively:

# coding: utf-8 import csv  D = {u'name':u'马克',u'pinyin':u'mǎkè'}  # Use newline='' instead of 'wb' in Python 3. with open('out.csv','w',encoding='utf-8-sig',newline='') as f:     w = csv.DictWriter(f,fieldnames=sorted(D.keys()))     w.writeheader()     w.writerow(D) 
like image 159
Mark Tolonen Avatar answered Sep 27 '22 21:09

Mark Tolonen