CSV reader picks up garbage in the first few characters

Tags:

I am trying to read the first line of a CSV file and assign it to header. The CSV file looks like this:

TIME,DAY,MONTH,YEAR
"3:21","23","FEB","2018"
"3:23","23","FEB","2018"
...

Here is the code:

import csv

with open("20180223.csv") as csvfile:
    rdr = csv.reader(csvfile)
    header = next(rdr)
    print(header)

I expect the output to look like:

['TIME', 'DAY', 'MONTH', 'YEAR']

However the output looks like this:

['ï»¿TIME', 'DAY', 'MONTH', 'YEAR']

What did I miss?

621

asked Mar 28 '18 19:03

Joshua Yonathan

1 Answers

That first character is the Byte order mark character.

Try this:

with open("20180223.csv", encoding="utf-8-sig") as csvfile:

This advice is somewhat hidden away in the documentation, but it is there:

In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. Use the ‘utf-8-sig’ codec to automatically skip the mark if present for reading such files.

154

answered Oct 20 '22 19:10

sjw

Related questions
                            
                                How do I get Python libraries in pyspark?
                            
                                Python Loop: List Index Out of Range
                            
                                Implement packing/unpacking in an object
                            
                                Pandas: replace empty cell to 0
                            
                                AttributeError: 'module' object has no attribute 'SFrame'
                            
                                customizing django admin ChangeForm template / adding custom content
                            
                                "No driver name specified" writing pandas data frame into SQL Server table
                            
                                How to convert a numeric column in pandas to a string with comma separators?
                            
                                How to use several summary collections in Tensorflow?
                            
                                Python - Replace non-ascii character in string (»)
                            
                                Trouble fitting simple data with MLPRegressor
                            
                                Iterating over multiple indices with i > j ( > k) in a pythonic way
                            
                                How do I read a csv stored in S3 with csv.DictReader?
                            
                                Pandas split dataframe column for every character
                            
                                How to add Tensorboard to a Tensorflow estimator process
                            
                                Flask : changing location of 'migrations' folder
                            
                                Avoid certain parameter combinations in GridSearchCV
                            
                                How to get battery percentage with python? [duplicate]
                            
                                splitting a column into multiple columns with specific name in pandas dataframe
                            
                                Pandas set_index doesn't drop the column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CSV reader picks up garbage in the first few characters

Tags:

python

python-3.x

csv

Joshua Yonathan

People also ask

1 Answers

sjw

Recent Activity

Donate For Us