I have great problems with Pandas and Excel. I read in an Excel document into a dataframe and that is fine. I do calculations, and all is well. Then I try to save the dataframe to inspect the results from the calculations, and pandas blows up with this error message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 4: ordinal not in range(128)
This usually indicates that there is some non ascii character in some cells in the dataframe. My usual approach is to drop that column because I just want to analyse the numbers, not strings.
However, which column should I drop? I have no clue. Ideally I would like pandas to tell me which cells are problematic, or which columns are problematic. As of now, pandas dont tell me nothing, so I need to manually inspect every cell to find the culprit. What I do is that I print out the bad character, and manually inspect every cell for that printed character. This is not doable. So I drop all string columns which is not doable either. How to solve this problem? How can I find the cell or column, that gives me problems with bad characters?
I have tried to convert everything to utf-8, sandwiching, etc etc but I can not get it to work. So I need a way to find the problematic cell so I can manually delete the string.
EDIT: Solved! As bdiamante suggested, his link shows a solution to this problem. For future reference, this works:
df.to_csv('file.csv',encoding='utf-8-sig')
Answered within the question, copied here:
df.to_csv('file.csv',encoding='utf-8-sig')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With