Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath / XQuery: find text in a node, but ignoring content of specific descendant elements

Tags:

xpath

xquery

I am trying to find a way to search for a string within nodes, but excluding ythe content of some subelements of those nodes. Plain and simple, I want to search for a string in paragraphs of a text, excluding the footnotes which are children elements of the paragraphs.

For example,

My document being:

<document>
   <p n="1">My text starts here/</p>
   <p n="2">Then it goes on there<footnote>It's not a very long text!</footnote></p>
</document>

When I'm searching for "text", I would like the Xpath / XQuery to retrieve the first p element, but not the second one (where "text" is contained only in the footnote subelement).

I have tried the contains() function, but it retrieves both p elements.

Any help would be much appreciated :)

like image 269
Hemka Avatar asked Jan 19 '11 12:01

Hemka


1 Answers

I want to search for a string in paragraphs of a text, excluding the footnotes which are children elements of the paragraphs

An XPath 1.0 - only solution:

Use:

//p//text()[not(ancestor::footnote) and contains(.,'text')]

Against the following XML document (obtained from yours but added p s within a footnote to make this more interesting):

<document>
    <p n="1">My text starts here/</p>
    <p n="2">Then it goes on there
        <footnote>It's not a very long text!
           <p>text</p>
        </footnote>
    </p>
</document>

this XPath expression selects exactly the wanted text node:

My text starts here/
like image 177
Dimitre Novatchev Avatar answered Sep 27 '22 20:09

Dimitre Novatchev