I have this sample xml document snippet
<root>
<foo>bar</foo>
<foo>baz</foo>
</root>
I'm using python's minidom method from xml.dom. I am reading in tags with getElementsByTagName("foo"). How do I get the text between the tags? And if the tags were nested, how would I get those?
So if you need to get the text out then you can do the following:
import xml.dom.minidom
document = "<root><foo>bar</foo><foo>baby</foo></root>"
dom = xml.dom.minidom.parseString(document)
def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
def handleTok(tokenlist):
texts = ""
for token in tokenlist:
texts += " "+ getText(token.childNodes)
return texts
foo = dom.getElementsByTagName("foo")
text = handleTok(foo)
print text
They have a good example on the site: http://docs.python.org/library/xml.dom.minidom.html
EDIT: For nested tags, check the example on the site.
Here is how with ElementTree:
xml='''\
<root>
<foo>bar</foo>
<foo>baz</foo>
</root>'''
import xml.etree.ElementTree as ET
for child in ET.fromstring(xml):
print child.tag, child.text
Prints:
foo bar
foo baz
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With