Reading UTF-8 with BOM using Python CSV module causes unwanted extra characters [duplicate]

Question

I am trying to read a CSV file with Python with the following code:

with open("example.txt") as f:
   c = csv.reader(f)
   for row in c:
      print row

My example.txt has only the following content:

Hello world!

For UTF-8 or ANSI encoded files, this gives me the expected output:

> ["Hello world!"]

But if I save the file as UTF-8 with BOM I get this output:

> ["\xef\xbb\xbfHello world!"]

Since I do not have any control over what files the user will use as input, I would like this to work with BOM as well. How can I fix this problem? Is there anything I need to do to ensure that this works for other encodings as well?

Martin Evans · Accepted Answer

You could make use of the unicodecsv Python module as follows:

import unicodecsv

with open('input.csv', 'rb') as f_input:
    csv_reader = unicodecsv.reader(f_input, encoding='utf-8-sig')
    print list(csv_reader)

So for an input file containing the following in UTF-8 with BOM:

c1,c2,c3,c4,c5,c6,c7,c8
1,2,3,4,5,6,7,8

It would display the following:

[[u'c1', u'c2', u'c3', u'c4', u'c5', u'c6', u'c7', u'c8'], [u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8']]

The unicodecsv module can be installed using pip as follows:

pip install unicodecsv

Reading UTF-8 with BOM using Python CSV module causes unwanted extra characters [duplicate]

Tags:

python

character-encoding

csv

byte-order-mark

python-2.7

Anders

1 Answers

Martin Evans

Recent Activity

Donate For Us

Reading UTF-8 with BOM using Python CSV module causes unwanted extra characters [duplicate]

Tags:

python

character-encoding

csv

byte-order-mark

python-2.7

Anders

1 Answers

Martin Evans

Related questions

Recent Activity

Donate For Us