Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath expression: selecting text nodes between element nodes

Tags:

xpath

textnode

Based in the following HTML I want to extract TextA, TextC and TextE.

<div id='content'>
    TextA
    <br/>
    <br/>
    <p>TextB</p>
    TextC
    <br/>
    TextC
    <p>TextD</p>
    TextE
</div>

I tried to get TextC like so but I don't get the result I want:

  • Query:
    //*[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
  • Expected result:
    ["TextC", <br/>, "TextC"]
  • Actual result:
    [<br/>]

Is there a way to select the text nodes without using indexes like //div/text()[1]?

like image 806
Michael Wyss Avatar asked May 16 '26 00:05

Michael Wyss


1 Answers

The reason why the two text nodes aren't in the result of your XPath is because * only match elements. To match both element and text node you can use node() instead :

//node()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]

Demo

Or if you want to get the text nodes only i.e excluding <br/>, you can use text() instead of node():

//text()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
like image 56
har07 Avatar answered May 19 '26 04:05

har07



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!