I keep getting:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 265-266: ordinal not in range(128)
when I try:
df.to_html("mypage.html")
here is a sample of how to reproduce the problem:
df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
df.to_html("mypage.html")
the list of elements in "a"
are of type "unicode"
.
when I want to export it to csv it works because you can do:
df.to_csv("myfile.csv", encoding="utf-8")
The way it worked for me:
html = df.to_html()
with open("dataframe.html", "w", encoding="utf-8") as file:
file.writelines('<meta charset="UTF-8">\n')
file.write(html)
Your problem is in other code. Your sample code has a Unicode string that has been mis-decoded as latin1
, Windows-1252
, or similar, since it has UTF-8 sequences in it. Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed:
>>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
>>> s.encode('latin1').decode('utf8')
u'Rue du Gu\xe9, 78120 Sonchamp'
>>> print(s.encode('latin1').decode('utf8'))
Rue du Gué, 78120 Sonchamp
The issue is actually in using df.to_html("mypage.html")
to save the HTML to a file directly. If instead you write the file yourself, you can avoid this encoding bug with pandas.
html = df.to_html()
with open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default):
<meta charset="UTF-8">
This was the only method that worked for me out of the several I've seen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With