Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python read csv - BOM embedded into the first key

I'm using Python 2.7.12. With this code snippet I'm saving a utf-8 csv file. I wrote the BOM (byte order mark) at the beginning of the file.

import codecs
import csv

outputFile = open("test.csv", "wb")
outputFile.write(codecs.BOM_UTF8)
fieldnames = ["a", "b"]
writer = csv.DictWriter(outputFile, fieldnames, delimiter=";")
writer.writeheader()
row = dict([])
for i in range(10):
    row["a"] = str(i).encode("utf-8")
    row["b"] = str(i*2).encode("utf-8")
    writer.writerow(row)
outputFile.close()

I want to load that csv file:

import codecs
import csv
inputFile = open("test.csv", "rb")
reader = csv.DictReader(inputFile, delimiter=";")
for row in reader:
    print row["a"]
inputFile.close()

The above code is going to fail: KeyError: 'a' If I print the row keys this is how they look: [u'\ufeffa', u'b']. The BOM has been embedded into the key a. What am I doing wrong?

like image 422
Davide_sd Avatar asked Oct 28 '16 17:10

Davide_sd


People also ask

How do I read the first row of a CSV file in Python?

Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.

What is BOM in csv file?

The ÿþ character is known as the byte order marking (BOM) character and is commonly found as the first line of a CSV file. ÿþ can not be seen when the CSV is opened with Notepad or Excel for that an Editor is required that can display the BOM (Byte Order Mark).

How do I skip the first line in a csv file?

In Python, while reading a CSV using the CSV module you can skip the first line using next() method.


2 Answers

You have to tell open that this is UTF-8 with BOM. I know that works with io.open:

import io

.
.
.
inputFile = io.open("test.csv", "r", encoding='utf-8-sig')
.
.
.

And you have to open the file in text mode, "r" instead of "rb".

like image 174
hvwaldow Avatar answered Oct 03 '22 23:10

hvwaldow


In Python 3, the built-in open function is an alias for io.open.

All you need to open a file encoded as UTF-8 with BOM:

open(path, newline='', encoding='utf-8-sig')

Example

import csv

...

with open(path, newline='', encoding='utf-8-sig') as csv_file:
    reader = csv.DictReader(csv_file, dialect='excel')
    for row in reader:
        print(row['first_name'], row['last_name'])
like image 21
Christopher Peisert Avatar answered Oct 03 '22 23:10

Christopher Peisert