Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode Encode Error when writing pandas df to csv

I cleaned 400 excel files and read them into python using pandas and appended all the raw data into one big df.

Then when I try to export it to a csv:

df.to_csv("path",header=True,index=False) 

I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position 20: ordinal not in range(128) 

Can someone suggest a way to fix this and what it means?

Thanks

like image 222
collarblind Avatar asked Jul 10 '15 02:07

collarblind


People also ask

How do I fix Unicode encode errors in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

How do I convert DF to pandas CSV?

By using pandas. DataFrame. to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column.

What encoding does pandas use?

Fixing encoding errors in Pandas In fact, Pandas assumes that text is in UTF-8 format, because it is so common.


2 Answers

You have unicode values in your DataFrame. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. You have to specify an encoding, such as utf-8. For example,

df.to_csv('path', header=True, index=False, encoding='utf-8') 

If you don't specify an encoding, then the encoding used by df.to_csv defaults to ascii in Python2, or utf-8 in Python3.

like image 182
unutbu Avatar answered Sep 30 '22 16:09

unutbu


Adding an answer to help myself google it later:

One trick that helped me is to encode a problematic series first, then decode it back to utf-8. Like:

df['crumbs'] = df['crumbs'].map(lambda x: x.encode('unicode-escape').decode('utf-8')) 

This would get the dataframe to print correctly too.

like image 32
tangfucius Avatar answered Sep 30 '22 16:09

tangfucius