Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: ascii codec cant encode character in position ordinal not in range - which cell?

I have great problems with Pandas and Excel. I read in an Excel document into a dataframe and that is fine. I do calculations, and all is well. Then I try to save the dataframe to inspect the results from the calculations, and pandas blows up with this error message:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 4: ordinal not in range(128)

This usually indicates that there is some non ascii character in some cells in the dataframe. My usual approach is to drop that column because I just want to analyse the numbers, not strings.

However, which column should I drop? I have no clue. Ideally I would like pandas to tell me which cells are problematic, or which columns are problematic. As of now, pandas dont tell me nothing, so I need to manually inspect every cell to find the culprit. What I do is that I print out the bad character, and manually inspect every cell for that printed character. This is not doable. So I drop all string columns which is not doable either. How to solve this problem? How can I find the cell or column, that gives me problems with bad characters?

I have tried to convert everything to utf-8, sandwiching, etc etc but I can not get it to work. So I need a way to find the problematic cell so I can manually delete the string.

EDIT: Solved! As bdiamante suggested, his link shows a solution to this problem. For future reference, this works:

df.to_csv('file.csv',encoding='utf-8-sig')
like image 905
Orvar Korvar Avatar asked Jun 19 '17 12:06

Orvar Korvar


1 Answers

Answered within the question, copied here:

df.to_csv('file.csv',encoding='utf-8-sig')
like image 59
Shovalt Avatar answered Oct 20 '22 01:10

Shovalt