I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by
f = open ('myText.txt',"r") data = f.read() f.close() root = ET.Element("add") doc = ET.SubElement(root, "doc") field = ET.SubElement(doc, "field") field.set("name", "text") field.text = data tree = ET.ElementTree(root) tree.write("output.xml")
And then I get the UnicodeDecodeError
. I already tried to put the special comment # -*- coding: utf-8 -*-
on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8')
but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.
UPDATE
Traceback: Using only the special comment on the first line of the script
Traceback (most recent call last): File "D:\Python\lse\createxml.py", line 151, in <module> tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml") File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write serialize(write, self._root, encoding, qnames, namespaces) File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml _serialize_xml(write, e, encoding, qnames, None) File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml _serialize_xml(write, e, encoding, qnames, None) File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml write(_escape_cdata(text, encoding)) File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata return text.encode(encoding, "xmlcharrefreplace") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina l not in range(128)
Traceback: Using .encode('utf-8')
Traceback (most recent call last): File "D:\Python\lse\createxml.py", line 148, in <module> field.text = data.encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina l not in range(128)
I used .decode('utf-8')
and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.
The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
You need to decode data from input string into unicode, before using it, to avoid encoding problems.
field.text = data.decode("utf8")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With