I have the following piece of XML:
...<span class="st">In Tim <em>Power</em>: Politieman...</span>...
I want to extract the part between the <span>
tags.
For this I use XPath:
/span[@class="st"]
This however will extract everything including the <span>
.
and.
/span[@class="st"]/text()
will return a list of two text elements. One containing "In Tim". The other ":Politieman". The <em>..</em>
is not included and is handled like a separator.
Is there a pure XPath solution which returns:
In Tim <em>Power</em>: Politieman...
EDIT
Thanks to @helderdarocha and @TextGeek. Seems non trivial to extract plain text with XPath only including the <em>
.
The /span[@class="st"]/node() solution creates a list containing the individual lines, from which it is trivial in Python to create a String.
Sounds like you want the equivalent of the Javascript DOM innerHTML() function, but for XML. I don't think there's a way to do that in pure XPath.
XPath doesn't really operate on markup strings like "<em>" and "</em>" at all -- it works with a tree of Node objects (there might possibly be an XPath implementation that tries to work directly off markup, but I doubt it). Most XPath implementations wouldn't even have the 4 characters "<em>" anywhere (except maybe kept around for printing error messages or something), and of course the DOM could have been built from scratch rather than from XML or other input in the first place. Likewise, XPath doesn't really figure on handing back marked-up strings, but lists of nodes.
In XSLT or XQuery you can do this easily, but not in XPath by itself, unless I'm missing something.
-s
To get any child node you can use:
/span[@class="st"]/node()
This will return:
<em>
node (element and contents).If you actually want all the text()
nodes, including the ones inside em
, then get all the text()
descendants:
/span[@class="st"]//text()
or
/span[@class="st"]/descendant::text()
This will return three text nodes, the text inside <em>
, but not the <em>
elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With