Based in the following HTML I want to extract TextA, TextC and TextE.
<div id='content'>
TextA
<br/>
<br/>
<p>TextB</p>
TextC
<br/>
TextC
<p>TextD</p>
TextE
</div>
I tried to get TextC like so but I don't get the result I want:
//*[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]["TextC", <br/>, "TextC"][<br/>]Is there a way to select the text nodes without using indexes like //div/text()[1]?
The reason why the two text nodes aren't in the result of your XPath is because * only match elements. To match both element and text node you can use node() instead :
//node()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
Demo
Or if you want to get the text nodes only i.e excluding <br/>, you can use text() instead of node():
//text()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With