I'm trying to do some data work in Python pandas and having trouble writing out my results. I read my data in as a CSV file and been exporting each script as it's own CSV file which works fine. Lately though I've tried exporting everything in 1 Excel file with worksheets and a few of the sheets give me an error
"'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte"
I have no idea how to even start finding any characters that could be causing problems exporting to Excel. Not sure why it exports to CSV just fine though :(
relevant lines
from pandas import ExcelWriter
data = pd.read_csv(input)
writer = ExcelWriter(output) #output is just the filename
fundraisers.to_excel(writer, "fundraisers")
locations.to_excel(writer, "locations") #error
locations.to_csv(outputcsv) #works
writer.save()
printing head of offending dataframe
Event ID Constituent ID Email Address First Name \ Last Name
f 1 A A 1
F 4 L R C
M 1 1 A D
F 4 A A G
M 2 0 R G
M 3 O O H
M 2 T E H
M 2 A A H
M 2 M M K
F 3 J E K
Location ID raised raised con raised email
a 0 0 0
a 8 0 0
o 0 0 0
o 0 0 0
o 0 0 0
t 5 0 0
o 1 0 0
o 6 a 0
o 6 0 0
d 0 0 0
looking at the excel sheet I do actually get a partial print out. Anything in the first name column and beyond are blank, but event, constituent and email all print.
edit: Trying to read the csv in as utf8 fails, but reading it in as latin1 works. Is there a way to specify the to_excel encoding? Or decode and encode my dataframe to utf8?
Managed to solve this.
I made a function that goes through my columns that have strings and managed to decode/encode them into utf8 and it now works.
def changeencode(data, cols):
for col in cols:
data[col] = data[col].str.decode('iso-8859-1').str.encode('utf-8')
return data
In my case, the problem was that I was initially reading the CSV file with the wrong encoding (ASCII
instead of cp1252
). Therefore, when pandas tried to write it to an Excel file, it found some characters it couldn't decode.
I solved it by specifying the correct encoding when reading the CSV file.
data = pd.read_csv(fname, encoding='cp1252')
Actually, there is a way to force utf8 encoding by passing a parameter to ExcelWriter:
ew = pandas.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
sampleList = ['Miño', '1', '2', 'señora']
dataframe = pandas.DataFrame(sampleList)
dataframe.to_excel(ew)
ew.save()
The simplest thing is to load your dataframe in utf-8. Then it ExcelWriter will save it no problem.
data = pd.read_csv(path,encoding='utf-8')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With