Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python ElementTree - iterate through child nodes and text in order

I am using python the third and the ElementTree API. I have some xml of the form:

<root>
  <item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
  <item>To Grandmother's <ref id="house" /> we go.</item>
</root>

I want to be able to iterate through the text and child nodes for a given item in order. So, for the first item, the list I want printed line by line would be:

Over the 
<Element 'ref' at 0x######>
 and through the 
<Element 'ref' at 0x######>
.

But I can't figure out how to do this with ElementTree. I can get the text in order via itertext() and the child elements in order in several ways, but not them interleaved together in order. I was hoping I could use an XPath expression like ./@text|./ref, but ElementTree's subset of XPath doesn't seem to support attribute selection. If I could even just get the original raw xml contents of each item node, I could parse it out myself if necessary.

like image 705
xdhmoore Avatar asked Feb 11 '17 09:02

xdhmoore


People also ask

How do I iterate over an XML tag in Python?

To iterate over all nodes, use the iter method on the ElementTree , not the root Element. The root is an Element, just like the other elements in the tree and only really has context of its own attributes and children. The ElementTree has the context for all Elements.

How do you process XML in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().


1 Answers

Try this:

from xml.etree import ElementTree as ET

xml = """<root>
  <item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
  <item>To Grandmother's <ref id="house" /> we go.</item>
</root>"""

root = ET.fromstring(xml)

for item in root:
    if item.text:
        print(item.text)
    for ref in item:
        print(ref)
        if ref.tail:
            print(ref.tail)

ElementTrees representation of "mixed content" is based on .text and .tail attributes. The .text of an element represents the text of the element up to the first child element. That child's .tail then contains the text of its parent following it. See the API doc.

like image 84
dnswlt Avatar answered Sep 21 '22 12:09

dnswlt