XML parsing with ElementTree and multiple elements

Question

I need to parse an XML which looks like :

<tag>
   text1 text2 text3
  <some-tag/>
       More text
  <some-tag/>
       Some more text
  <some-tag/>
  Even more text
</tag>

Using ElementTree's head and tail method, I can get to "text1 text2 text3" and "Even more text".

However, I am unable to come up with a way to reach the text in the middle ("More text" and "Some more text").

Due to the idiosyncrasies of the software generating the XML, I cannot be sure of the stray tags and hence can't use the command find('some-tag').

Is there any way that I can parse this XML using python?

Thanks

Justin O Barber · Accepted Answer

More text and Some more text are tails of some-tag. See the following:

>>> import xml.etree.cElementTree as et
>>> text = """<tag>
   text1 text2 text3
  <some-tag/>
       More text
  <some-tag/>
       Some more text
  <some-tag/>
  Even more text
</tag>"""
>>> root = et.fromstring(text)
>>> for element in root:  # leaving aside the text and tail of root for the moment
    print element.tag, ': text =>', element.text or '', 'tail =>', element.tail

some-tag : text =>  tail =>  # the tail also has a newline character and white space at its beginning
       More text

some-tag : text =>  tail => 
       Some more text

some-tag : text =>  tail => 
  Even more text

Thus you will need to iterate through the children of each element in order to see if the children have any tails.

XML parsing with ElementTree and multiple elements

Tags:

python

xml

elementtree

Suvir

1 Answers

Justin O Barber

Recent Activity

Donate For Us

XML parsing with ElementTree and multiple elements

Tags:

python

xml

elementtree

Suvir

1 Answers

Justin O Barber

Related questions

Recent Activity

Donate For Us