I am using python the third and the ElementTree API. I have some xml of the form:
<root>
<item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
<item>To Grandmother's <ref id="house" /> we go.</item>
</root>
I want to be able to iterate through the text and child nodes for a given item in order. So, for the first item, the list I want printed line by line would be:
Over the
<Element 'ref' at 0x######>
and through the
<Element 'ref' at 0x######>
.
But I can't figure out how to do this with ElementTree. I can get the text in order via itertext()
and the child elements in order in several ways, but not them interleaved together in order. I was hoping I could use an XPath expression like ./@text|./ref
, but ElementTree's subset of XPath doesn't seem to support attribute selection. If I could even just get the original raw xml contents of each item node, I could parse it out myself if necessary.
To iterate over all nodes, use the iter method on the ElementTree , not the root Element. The root is an Element, just like the other elements in the tree and only really has context of its own attributes and children. The ElementTree has the context for all Elements.
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
Try this:
from xml.etree import ElementTree as ET
xml = """<root>
<item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
<item>To Grandmother's <ref id="house" /> we go.</item>
</root>"""
root = ET.fromstring(xml)
for item in root:
if item.text:
print(item.text)
for ref in item:
print(ref)
if ref.tail:
print(ref.tail)
ElementTree
s representation of "mixed content" is based on .text
and .tail
attributes. The .text
of an element represents the text of the element up to the first child element. That child's .tail
then contains the text of its parent following it. See the API doc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With