Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV reader picks up garbage in the first few characters

I am trying to read the first line of a CSV file and assign it to header. The CSV file looks like this:

TIME,DAY,MONTH,YEAR
"3:21","23","FEB","2018"
"3:23","23","FEB","2018"
...

Here is the code:

import csv

with open("20180223.csv") as csvfile:
    rdr = csv.reader(csvfile)
    header = next(rdr)
    print(header)

I expect the output to look like:

['TIME', 'DAY', 'MONTH', 'YEAR']

However the output looks like this:

['TIME', 'DAY', 'MONTH', 'YEAR']

What did I miss?

like image 621
Joshua Yonathan Avatar asked Mar 28 '18 19:03

Joshua Yonathan


People also ask

How do I skip the first line while reading a CSV file in Python?

In Python, while reading a CSV using the CSV module you can skip the first line using next() method. We usually want to skip the first line when the file is containing a header row, and we don't want to print or import that row.

Which is the correct method to read a CSV file?

If you already have Microsoft Excel installed, just double-click a CSV file to open it in Excel. After double-clicking the file, you may see a prompt asking which program you want to open it with. Select Microsoft Excel. If you are already in Microsoft Excel, you can choose File > Open and select the CSV file.


1 Answers

That first character is the Byte order mark character.

Try this:

with open("20180223.csv", encoding="utf-8-sig") as csvfile:

This advice is somewhat hidden away in the documentation, but it is there:

In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. Use the ‘utf-8-sig’ codec to automatically skip the mark if present for reading such files.

like image 154
sjw Avatar answered Oct 20 '22 19:10

sjw