Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath: select text after certain tag and before same next tag

Tags:

xpath

I have html code like this:

<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Second term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text

I need to get text nodes between tag with text "Term" and before next tag:

Some text
More text
Some more lines of text
Some text
More text
Some more lines of text
Some text
More text
Some more lines of text

Here can be used condition: previous tag must contains text "Term", but I don't know how to create xpath selector like this.

like image 723
Stephan Olmer Avatar asked Jun 21 '11 09:06

Stephan Olmer


2 Answers

//text()[preceding::*[contains(text(),'Term:')] and following::*[contains(text(),'Term:')]]

It's same as what empo has suggested. However I'm looking for a node containing Term and returning all text nodes present between them.

However, this works fine only if you don't have any other set of "Term". Let me know if that is the case, because then this Xpath will return some unwanted values also.

Since now you have updated the input. I have simply put one more condition to the previous Xpath.

//text()[preceding::*[contains(text(),'Term:')] and following::*[contains(text(),'Term:')] and not(contains(., 'Term:'))]

@empo solution also works. But there we are taking <strong> into account. The xpath that I have written simply checks for word 'Term:' and gives out all the textNodes between them.

Let me know if this works for you.

Regards.

like image 144
Ravish Avatar answered Sep 30 '22 11:09

Ravish


Your question is still ambiguous and your input document is not well formed. Check this:

root/text()[preceding::strong[1][contains(text(),'Term')]]

Applied on:

<root>
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Term:</strong>
Some text2<br />
More text2<br />
Some more lines of text2
<strong>Second term:</strong>
Some text3<br />
More text3<br />
Some more lines of text3
<strong>Term:</strong>
Some text4<br />
More text4<br />
Some more lines of text4
</root>

produces:

Some text
More text
Some more lines of text

Some text2
More text2
Some more lines of text2

Some text4
More text4
Some more lines of text4

This XPath, selects all text nodes between an element containing the string Term: and an element containing any string:

//text()[preceding::*[contains(text(),'Term:')] and following::*[text()]]

Applied on:

<root>
<strong>Term:</strong>
Some text<br />
More text<br />
Some more lines of text
<strong>Second term:</strong>
Some text2<br />
More text2<br />
Some more lines of text2
</root>

Returns:

Some text
More text
Some more lines of text
like image 45
Emiliano Poggi Avatar answered Sep 30 '22 11:09

Emiliano Poggi