Use either ET. tostring(root). decode() or ET. tostring(root, encoding='unicode', method='xml') instead.
Example Read XML File in Python To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .
Element
objects have no .getroot()
method. Drop that call, and the .tostring()
call works:
xmlstr = ElementTree.tostring(et, encoding='utf8', method='xml')
You only need to use .getroot()
if you have an ElementTree
instance.
Other notes:
This produces a bytestring, which in Python 3 is the bytes
type.
If you must have a str
object, you have two options:
Decode the resulting bytes value, from UTF-8: xmlstr.decode("utf8")
Use encoding='unicode'
; this avoids an encode / decode cycle:
xmlstr = ElementTree.tostring(et, encoding='unicode', method='xml')
If you wanted the UTF-8 encoded bytestring value or are using Python 2, take into account that ElementTree doesn't properly detect utf8
as the standard XML encoding, so it'll add a <?xml version='1.0' encoding='utf8'?>
declaration. Use utf-8
or UTF-8
(with a dash) if you want to prevent this. When using encoding="unicode"
no declaration header is added.
ElementTree.Element
to a String?For Python 3:
xml_str = ElementTree.tostring(xml, encoding='unicode')
For Python 2:
xml_str = ElementTree.tostring(xml, encoding='utf-8')
The following is compatible with both Python 2 & 3, but only works for Latin characters:
xml_str = ElementTree.tostring(xml).decode()
from xml.etree import ElementTree
xml = ElementTree.Element("Person", Name="John")
xml_str = ElementTree.tostring(xml).decode()
print(xml_str)
Output:
<Person Name="John" />
Despite what the name implies, ElementTree.tostring()
returns a bytestring by default in Python 2 & 3. This is an issue in Python 3, which uses Unicode for strings.
In Python 2 you could use the
str
type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. [...]To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.
Source: Porting Python 2 Code to Python 3
If we know what version of Python is being used, we can specify the encoding as unicode
or utf-8
. Otherwise, if we need compatibility with both Python 2 & 3, we can use decode()
to convert into the correct type.
For reference, I've included a comparison of .tostring()
results between Python 2 and Python 3.
ElementTree.tostring(xml)
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />
ElementTree.tostring(xml, encoding='unicode')
# Python 3: <Person Name="John" />
# Python 2: LookupError: unknown encoding: unicode
ElementTree.tostring(xml, encoding='utf-8')
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />
ElementTree.tostring(xml).decode()
# Python 3: <Person Name="John" />
# Python 2: <Person Name="John" />
Thanks to Martijn Peters for pointing out that the str
datatype changed between Python 2 and 3.
In most scenarios, using str()
would be the "cannonical" way to convert an object to a string. Unfortunately, using this with Element
returns the object's location in memory as a hexstring, rather than a string representation of the object's data.
from xml.etree import ElementTree
xml = ElementTree.Element("Person", Name="John")
print(str(xml)) # <Element 'Person' at 0x00497A80>
Extension to @Stevoisiak's answer and dealing with non-Latin characters. Only one way will display the non-Latin characters to you. The one method is different on both Python 3 and Python 2.
Input
xml = ElementTree.fromstring('<Person Name="크리스" />')
xml = ElementTree.Element("Person", Name="크리스") # Read Note about Python 2
NOTE: In Python 2, when calling the
toString(...)
code, assigningxml
withElementTree.Element("Person", Name="크리스")
will raise an error...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 0: ordinal not in range(128)
Output
ElementTree.tostring(xml)
# Python 3 (크리스): b'<Person Name="크리스" />'
# Python 3 (John): b'<Person Name="John" />'
# Python 2 (크리스): <Person Name="크리스" />
# Python 2 (John): <Person Name="John" />
ElementTree.tostring(xml, encoding='unicode')
# Python 3 (크리스): <Person Name="크리스" /> <-------- Python 3
# Python 3 (John): <Person Name="John" />
# Python 2 (크리스): LookupError: unknown encoding: unicode
# Python 2 (John): LookupError: unknown encoding: unicode
ElementTree.tostring(xml, encoding='utf-8')
# Python 3 (크리스): b'<Person Name="\xed\x81\xac\xeb\xa6\xac\xec\x8a\xa4" />'
# Python 3 (John): b'<Person Name="John" />'
# Python 2 (크리스): <Person Name="크리스" /> <-------- Python 2
# Python 2 (John): <Person Name="John" />
ElementTree.tostring(xml).decode()
# Python 3 (크리스): <Person Name="크리스" />
# Python 3 (John): <Person Name="John" />
# Python 2 (크리스): <Person Name="크리스" />
# Python 2 (John): <Person Name="John" />
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With