Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'ascii' codec can't encode characters

I have a dict that's feed with url response. Like:

>>> d
{
0: {'data': u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'}
1: {'data': u'<p>some other data</p>'}
...
}

While using xml.etree.ElementTree function on this data values (d[0]['data']) I get the most famous error message:

UnicodeEncodeError: 'ascii' codec can't encode characters...

What should I do to this Unicode string to make it suitable for ElementTree parser?

PS. Please don't send me links with Unicode & Python explanation. I read it all already unfortunately, and can't make use of it, as hopefully others can.

like image 475
theta Avatar asked Nov 21 '12 12:11

theta


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

How do I print Unicode characters in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.


1 Answers

You'll have to encode it manually, to UTF-8:

ElementTree.fromstring(d[0]['data'].encode('utf-8'))

as the API only takes encoded bytes as input. UTF-8 is a good default for such data.

It'll be able to decode to unicode again from there:

>>> from xml.etree import ElementTree
>>> p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
found "拉柏 多公 园"
like image 108
Martijn Pieters Avatar answered Sep 19 '22 13:09

Martijn Pieters