Select nodeValue but exclude child elements

Question

Let's say I have this code:

<p dataname="description">
Hello this is a description. <a href="#">Click here for more.</a>
</p>

How do I select the nodeValue of p but exclude a and it's content?

My current code:

$result = $xpath->query("//p[@dataname='description'][not(self::a)]");

I select it by $result->item(0)->nodeValue;

Kristofer · Accepted Answer

Simply appending /text() to your query should do the trick

$result = $xpath->query("//p[@dataname='description'][not(self::a)]/text()");

Sjaak Trekhaak · Answer

Unsure if PHP's XPath supports this, but this XPath does the trick for me in Scrapy (Python based scraping framework):

$xpath->query('//p[@dataname='description']/text()[following-sibling::a]')

If this doesn't work, try Kristoffers solution, or you could also use a regex solution. For example:

$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);

That'll remove any HTML tag with any content in it, excluding text which is not encapsulated by HTML tags.

Donate For Us