Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas reading csv file error in column name

I have a csv file with the first 2 rows with data as:

NewDateTime ResourceName    
9/18/12 1:00    ANACACHO_ANA    
9/18/12 2:00    ANACACHO_ANA    

When I read it using pandas data frame as:

df = pd.read_csv(r'MyFile.csv')

I get

df1.columns
Index([u'NewDateTime', u'ResourceName', dtype='object')

However, when I try

df1['NewDateTime']

I get error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 5: ordinal not in range(128)

Also the df1['NewDateTime'] on my pycharm interpreter has a little dash as in df1['-NewDateTime'] but when I paste it here the dash doesn't show up

like image 708
Zanam Avatar asked Feb 11 '26 13:02

Zanam


1 Answers

It looks like your CSV file has a BOM (Byte Order Mark) signature, so try to parse using 'utf-8-sig', 'utf-16' or another encoding with BOM:

df = pd.read_csv(r'MyFile.csv', encoding='utf-8-sig')

Here is a small demo:

In [18]: pd.read_csv(fn).columns
Out[18]: Index([u'?NewDateTime', u'ResourceName'], dtype='object')

In [19]: pd.read_csv(fn, encoding='utf-8-sig').columns
Out[19]: Index([u'NewDateTime', u'ResourceName'], dtype='object')

in my iPython terminal the BOM signature is showed as ? in u'?NewDateTime' - in your case it's a dash sign: df1['-NewDateTime']

like image 95
MaxU - stop WAR against UA Avatar answered Feb 14 '26 03:02

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!