Extract text from XML node that comes after child node

Question

I'm trying to parse an XML document with nodes that have some text, then declare a child node, and then have more text. For example, the second "post" element in the XML below:

<?xml version="1.0"?>
<data>
    <post>
        this is some text
    </post>
    <post>
        here is some more text
        <quote> and a nested node </quote>
        and more text after the nested node
    </post>
</data>

I used the following code to try to print out the text of each node:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

for child in root:
    print (child.text)

But unfortunately the only output is:

this is some text
here is some more text

Note that I'm missing the text and more text after the nested node.

So,

Is this valid XML?
If yes, how can I use ElementTree or another Python XML library to achieve the desired parse?
If no, any suggestions to parse the XML short of writing my own parser?

jdillard · Accepted Answer

Ah, found the answer here: How can I iterate child text nodes (not descendants) in ElementTree?

Basically I have to use the .tail attribute of the child node to access the text that was missing before.

Extract text from XML node that comes after child node

Tags:

python

xml

elementtree

jdillard

1 Answers

jdillard

Recent Activity

Donate For Us

Extract text from XML node that comes after child node

Tags:

python

xml

elementtree

jdillard

1 Answers

jdillard

Related questions

Recent Activity

Donate For Us