Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract text from XML node that comes after child node

I'm trying to parse an XML document with nodes that have some text, then declare a child node, and then have more text. For example, the second "post" element in the XML below:

<?xml version="1.0"?>
<data>
    <post>
        this is some text
    </post>
    <post>
        here is some more text
        <quote> and a nested node </quote>
        and more text after the nested node
    </post>
</data>

I used the following code to try to print out the text of each node:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

for child in root:
    print (child.text)

But unfortunately the only output is:

this is some text
here is some more text

Note that I'm missing the text and more text after the nested node.

So,

  1. Is this valid XML?
  2. If yes, how can I use ElementTree or another Python XML library to achieve the desired parse?
  3. If no, any suggestions to parse the XML short of writing my own parser?
like image 482
jdillard Avatar asked May 24 '18 16:05

jdillard


1 Answers

Ah, found the answer here: How can I iterate child text nodes (not descendants) in ElementTree?

Basically I have to use the .tail attribute of the child node to access the text that was missing before.

like image 162
jdillard Avatar answered Oct 13 '22 11:10

jdillard