Let's say I have this code:
<p dataname="description">
Hello this is a description. <a href="#">Click here for more.</a>
</p>
How do I select the nodeValue of p
but exclude a
and it's content?
My current code:
$result = $xpath->query("//p[@dataname='description'][not(self::a)]");
I select it by $result->item(0)->nodeValue;
Simply appending /text() to your query should do the trick
$result = $xpath->query("//p[@dataname='description'][not(self::a)]/text()");
Unsure if PHP's XPath supports this, but this XPath does the trick for me in Scrapy (Python based scraping framework):
$xpath->query('//p[@dataname='description']/text()[following-sibling::a]')
If this doesn't work, try Kristoffers solution, or you could also use a regex solution. For example:
$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);
That'll remove any HTML tag with any content in it, excluding text which is not encapsulated by HTML tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With