Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python and ElementTree: return "inner XML" excluding parent element

In Python 2.6 using ElementTree, what's a good way to fetch the XML (as a string) inside a particular element, like what you can do in HTML and javascript with innerHTML?

Here's a simplified sample of the XML node I am starting with:

<label attr="foo" attr2="bar">This is some text <a href="foo.htm">and a link</a> in embedded HTML</label>

I'd like to end up with this string:

This is some text <a href="foo.htm">and a link</a> in embedded HTML

I've tried iterating over the parent node and concatenating the tostring() of the children, but that gave me only the subnodes:

# returns only subnodes (e.g. <a href="foo.htm">and a link</a>)
''.join([et.tostring(sub, encoding="utf-8") for sub in node])

I can hack up a solution using regular expressions, but was hoping there'd be something less hacky than this:

re.sub("</\w+?>\s*?$", "", re.sub("^\s*?<\w*?>", "", et.tostring(node, encoding="utf-8")))
like image 273
Justin Grant Avatar asked Aug 09 '10 20:08

Justin Grant


1 Answers

How about:

from xml.etree import ElementTree as ET

xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
root = ET.fromstring(xml)

def content(tag):
    return tag.text + ''.join(ET.tostring(e) for e in tag)

print content(root)
print content(root.find('child2'))

Resulting in:

start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here
here as well<sub2 /><sub3 />
like image 194
Mark Tolonen Avatar answered Sep 16 '22 13:09

Mark Tolonen