I need to parse an XML which looks like :
<tag>
text1 text2 text3
<some-tag/>
More text
<some-tag/>
Some more text
<some-tag/>
Even more text
</tag>
Using ElementTree's head and tail method, I can get to "text1 text2 text3" and "Even more text".
However, I am unable to come up with a way to reach the text in the middle ("More text" and "Some more text").
Due to the idiosyncrasies of the software generating the XML, I cannot be sure of the stray tags and hence can't use the command find('some-tag').
Is there any way that I can parse this XML using python?
Thanks
More text and Some more text are tails of some-tag. See the following:
>>> import xml.etree.cElementTree as et
>>> text = """<tag>
text1 text2 text3
<some-tag/>
More text
<some-tag/>
Some more text
<some-tag/>
Even more text
</tag>"""
>>> root = et.fromstring(text)
>>> for element in root: # leaving aside the text and tail of root for the moment
print element.tag, ': text =>', element.text or '', 'tail =>', element.tail
some-tag : text => tail => # the tail also has a newline character and white space at its beginning
More text
some-tag : text => tail =>
Some more text
some-tag : text => tail =>
Even more text
Thus you will need to iterate through the children of each element in order to see if the children have any tails.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With