Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas to_excel 'utf8' codec can't decode byte

I'm trying to do some data work in Python pandas and having trouble writing out my results. I read my data in as a CSV file and been exporting each script as it's own CSV file which works fine. Lately though I've tried exporting everything in 1 Excel file with worksheets and a few of the sheets give me an error

"'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte"

I have no idea how to even start finding any characters that could be causing problems exporting to Excel. Not sure why it exports to CSV just fine though :(

relevant lines

from pandas import ExcelWriter
data = pd.read_csv(input)
writer = ExcelWriter(output) #output is just the filename
fundraisers.to_excel(writer, "fundraisers")
locations.to_excel(writer, "locations") #error
locations.to_csv(outputcsv) #works
writer.save()

printing head of offending dataframe

Event ID    Constituent ID  Email Address   First Name  \   Last Name
f       1       A       A       1
F       4       L       R       C
M       1       1       A       D
F       4       A       A       G
M       2       0       R       G
M       3       O       O       H
M       2       T       E       H
M       2       A       A       H
M       2       M       M       K
F       3       J       E       K
Location ID raised  raised con  raised email
a   0   0   0
a   8   0   0
o   0   0   0
o   0   0   0
o   0   0   0
t   5   0   0
o   1   0   0
o   6   a   0
o   6   0   0
d   0   0   0

looking at the excel sheet I do actually get a partial print out. Anything in the first name column and beyond are blank, but event, constituent and email all print.

edit: Trying to read the csv in as utf8 fails, but reading it in as latin1 works. Is there a way to specify the to_excel encoding? Or decode and encode my dataframe to utf8?

like image 728
Wizuriel Avatar asked Sep 05 '13 20:09

Wizuriel


4 Answers

Managed to solve this.

I made a function that goes through my columns that have strings and managed to decode/encode them into utf8 and it now works.

def changeencode(data, cols):
    for col in cols:
        data[col] = data[col].str.decode('iso-8859-1').str.encode('utf-8')
    return data   
like image 87
Wizuriel Avatar answered Sep 25 '22 20:09

Wizuriel


In my case, the problem was that I was initially reading the CSV file with the wrong encoding (ASCII instead of cp1252). Therefore, when pandas tried to write it to an Excel file, it found some characters it couldn't decode.

I solved it by specifying the correct encoding when reading the CSV file.

data = pd.read_csv(fname, encoding='cp1252')
like image 29
Zenadix Avatar answered Sep 21 '22 20:09

Zenadix


Actually, there is a way to force utf8 encoding by passing a parameter to ExcelWriter:

 ew = pandas.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
 sampleList = ['Miño', '1', '2', 'señora']
 dataframe = pandas.DataFrame(sampleList)
 dataframe.to_excel(ew)
 ew.save()
like image 27
user3570953 Avatar answered Sep 25 '22 20:09

user3570953


The simplest thing is to load your dataframe in utf-8. Then it ExcelWriter will save it no problem.

data = pd.read_csv(path,encoding='utf-8')
like image 31
billmanH Avatar answered Sep 25 '22 20:09

billmanH