Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return full text element (including child/descendant elements)

Tags:

xpath

nokogiri

I'm trying to get the text from the first occurrence on the page of div/p, and only the first p. The <p> contains other tags (<b>, <a href>) and the returned text from <p> stops at any other tag. Is there a way to get this line to return all the text between <p> and </p>, even between embedded tags?

puts doc.xpath('html/body/div/p[1]/text()').first
like image 752
chuckfinley Avatar asked Mar 21 '26 14:03

chuckfinley


1 Answers

Use:

string((//div/p)[1])

When this XPath expression is evaluated the result is the string value of the first p in the document that is a child of a div.

By definition the string value of an element is the concatenation (in document order) of all of its text-node descendents.

Therefore, you get exactly all the text in the subtree rooted by this p element, with any other nodes (elements, comments, PIs) skipped.

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
     <xsl:copy-of select="string(p)"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document (no such provided!):

<p>
 Hello <b>
  <a href="http://www.w3.org/TR/2008/REC-xml-20081126/">XML</a>
   World!</b>
</p>

the result of the evaluated XPath expression is output:

 Hello XML
   World!
like image 117
Dimitre Novatchev Avatar answered Mar 25 '26 01:03

Dimitre Novatchev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!