Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame's accented characters appearing garbled in Excel

With:

# -*- coding: utf-8 -*-

at the top of my .ipynb, Jupyter is now displaying accented characters correctly.

When I export to csv (with .to_csv()) a pandas data frame containing accented characters:

enter image description here

... the characters do not render properly when the csv is opened in Excel.

enter image description here

This is the case whether I set the encoding='utf-8' or not. Is pandas/python doing all that it can here, and this is an Excel issue? Or can something be done before the export to csv?

  • Python: 2.7.10
  • Pandas: 0.17.1
  • Excel: Excel for Mac 2011
like image 498
Pyderman Avatar asked Mar 30 '16 02:03

Pyderman


5 Answers

If you want to keep accents, try with encoding='iso-8859-1'

df.to_csv(path,encoding='iso-8859-1',sep=';')
like image 90
Juliana Rivera Avatar answered Oct 26 '22 22:10

Juliana Rivera


I had similar problem, also on a Mac. I noticed that the unicode string showed up fine when I opened the csv in TextEdit, but showed up garbled when I opened in Excel.

Thus, I don't think there is any way successfully export unicode to Excel with to_csv, but I'd expect the default to_excel writer to suffice.

df.to_excel('file.xlsx', encoding='utf-8')
like image 21
Selah Avatar answered Oct 26 '22 23:10

Selah


I also had the same inconvenience. When I checked the Dataframe in the Jupyter notebook I saw that everything was in order.

The problem happens when I try to open the file directly (as it has a .csv extension Excel can open it directly).

The solution for me was to open a new blank excel workbook, and import the file from the "Data" tab, like this:

  • Import External Data
  • Import Data from text
  • I choose the file
  • In the import wizard window, where it says "File origin" in the drop-down list, I chose the "65001 : Unicode (utf-8)"

Then i just choose the right delimiter, and that was it for me.

like image 35
Edison Arcángel Avatar answered Oct 27 '22 00:10

Edison Arcángel


I think using a different excel writer helps, recommending xlsxwriter

import pandas as pd
df = ...
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
df.to_excel(writer)
writer.save()
like image 31
Deo Leung Avatar answered Oct 26 '22 22:10

Deo Leung


Maybe try this function for your columns if you can't get Excel to cooperate. It will remove the accents using the unicodedata library:

import unicodedata

def remove_accents(input_str):

    if type(input_str) == unicode:
        nfkd_form = unicodedata.normalize('NFKD', input_str)
        return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
    else:
        return input_str
like image 23
Greg Friedman Avatar answered Oct 27 '22 00:10

Greg Friedman