I have a set of xml files with strings of text enclosed by elements and then subdivided by other elements. For example,
<a>rhino<b>c<c>er</c></b>os</a>
<a> contains a single word, with sets of characters within that word marked up for various reasons. However, I want to be able to write a query that retrieves the whole word, that is, the text string in <a> without spaces or any trace that some of the text comes from descendant elements at all (in the example, the result should be "rhinoceros").
How do I do this? I have researched multiple methods for retrieving descendant text nodes but these all ultimately result in either some of the desired word being omitted or, at best (" //w/descendant-or-self::*/text() "), all text nodes being retrieved but as different search results.
I am still a beginner at all things xml, so apologies if I am asking something quite basic. I am happy to take reading recommendations in lieu of a straightforward answer.
Thank you!
If you have XPath 2.0, use
string-join(//text(), '')
on the XML
<a>rhino<b>c<c>er</c></b>os</a>
It selects all text nodes and concatenates them without spaces.
Try it on https://www.freeformatter.com/xpath-tester.html
Welcome to the wonderful world of XPath and XPath-based languages!
The string() function may be the most direct route to achieving your goal. It returns the string value of the item provided as its argument, so:
string(<a>rhino<b>c<c>er</c></b>os</a>)
... will return:
rhinoceros
See the XPath and XQuery Functions and Operators specification for this function:
https://www.w3.org/TR/xpath-functions/#func-string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With