I have this char in an xml file:
<data> <products> <color>fumè</color> </product> </data>
I try to generate an instance of ElementTree with the following code:
string_data = open('file.xml') x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))
and I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 185: ordinal not in range(128)
(NOTE: The position is not exact, I sampled the xml from a larger one).
How to solve it? Thanks
The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
The parse() function is used to parse from files and file-like objects. As an example of such a file-like object, the following code uses the BytesIO class for reading from a string instead of an external file.
Might you have stumbled upon this problem while using Requests (HTTP for Humans), response.text
decodes the response by default, you can use response.content
to get the undecoded data, so ElementTree can decode it itself. Just remember to use the correct encoding.
More info: http://docs.python-requests.org/en/latest/user/quickstart/#response-content
You need to decode utf-8 strings into a unicode object. So
string_data.encode('utf-8')
should be
string_data.decode('utf-8')
assuming string_data
is actually an utf-8 string.
So to summarize: To get an utf-8 string from a unicode object you encode the unicode (using the utf-8 encoding), and to turn a string to a unicode object you decode the string using the respective encoding.
For more details on the concepts I suggest reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (not Python specific).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With