I have a dict that's feed with url response. Like:
>>> d
{
0: {'data': u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'}
1: {'data': u'<p>some other data</p>'}
...
}
While using xml.etree.ElementTree
function on this data values (d[0]['data']
) I get the most famous error message:
UnicodeEncodeError: 'ascii' codec can't encode characters...
What should I do to this Unicode string to make it suitable for ElementTree parser?
PS. Please don't send me links with Unicode & Python explanation. I read it all already unfortunately, and can't make use of it, as hopefully others can.
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.
You'll have to encode it manually, to UTF-8:
ElementTree.fromstring(d[0]['data'].encode('utf-8'))
as the API only takes encoded bytes as input. UTF-8 is a good default for such data.
It'll be able to decode to unicode again from there:
>>> from xml.etree import ElementTree
>>> p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
found "拉柏 多公 园"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With