Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python CSV DictReader with UTF-8 data

AFAIK, the Python (v2.6) csv module can't handle unicode data by default, correct? In the Python docs there's an example on how to read from a UTF-8 encoded file. But this example only returns the CSV rows as a list. I'd like to access the row columns by name as it is done by csv.DictReader but with UTF-8 encoded CSV input file.

Can anyone tell me how to do this in an efficient way? I will have to process CSV files in 100's of MByte in size.

like image 528
LMatter Avatar asked Feb 15 '11 14:02

LMatter


People also ask

How do I open a utf-8 CSV file in Python?

encode('utf-8') filename = 'output. csv' reader = unicode_csv_reader(open(filename)) try: products = [] for field1, field2, field3 in reader: ...

What does CSV DictReader do in Python?

Python CSV DictReader The csv. DictReader class operates like a regular reader but maps the information read into a dictionary. The keys for the dictionary can be passed in with the fieldnames parameter or inferred from the first row of the CSV file.

How do I read a CSV file in Python encoding?

import codecs delimiter = ';' reader = codecs. open("your_filename. csv", 'r', encoding='utf-8') for line in reader: row = line. split(delimiter) # do something with your row ...

What is the difference between reader () and DictReader () function?

Reader() allows you to access CSV data using indexes and is ideal for simple CSV files. csv. DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files.


1 Answers

I came up with an answer myself:

def UnicodeDictReader(utf8_data, **kwargs):     csv_reader = csv.DictReader(utf8_data, **kwargs)     for row in csv_reader:         yield {unicode(key, 'utf-8'):unicode(value, 'utf-8') for key, value in row.iteritems()} 

Note: This has been updated so keys are decoded per the suggestion in the comments

like image 101
LMatter Avatar answered Sep 23 '22 09:09

LMatter