Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using XPath to get text of paragraph with links inside

Tags:

html

xpath

I'm parsing HTML page with XPath and want to grab whole text of some specific paragraph, including text of links.

For example I have following paragraph:

<p class="main-content">
    This is sample paragraph with <a href="http://google.com">link</a> inside.
</p>

I need to get following text as result: "This is sample paragraph with link inside", however applying "//p[@class'main-content']/text()" gives me only "This is sample paragraph with inside".

Could you please assist? Thanks.

like image 983
Alex Silachev Avatar asked Nov 09 '11 14:11

Alex Silachev


1 Answers

To get the whole text content of a node, use the string function:

string(//p[@class="main-content"])

Note that this gets a string value. If you want text nodes (as returned by text()), you can do this. You need to search at all depths:

//p[@class="main-content"]//text()

This returns three text nodes: This is sample paragraph with, link and inside.

like image 147
lonesomeday Avatar answered Sep 28 '22 16:09

lonesomeday