How to select only leaf nodes with Nokogiri?

Question

I am looking for some advices on how it could be done. I'm trying a solution only with xpath:

An html example:

<div>
  <div>
    <div>text div (leaf)</div>
    <p>text paragraph (leaf)</p>
  </div>
</div>
<p>text paragraph 2 (leaf)</p>

Code:

doc = Nokogiri::HTML.fragment("- the html above -")
result = doc.xpath("*[not(child::*)]")


[#<Nokogiri::XML::Element:0x3febf50f9328 name="p" children=[#<Nokogiri::XML::Text:0x3febf519b718 "text paragraph 2 (leaf)">]>]

But this xpath only gives me the last "p". What I want is like a flatten behavior, only returning the leaf nodes.

Here are some reference answers in stackoverflow:

How to select all leaf nodes using XPath expression?

XPath - Get node with no child of specific type

Thanks

Justin Ko · Accepted Answer

You can find all element nodes that have no child elements using:

//*[not(*)]

Example:

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-end
<div>
  <div>
    <div>text div (leaf)</div>
    <p>text paragraph (leaf)</p>
  </div>
</div>
<p>text paragraph 2 (leaf)</p>
end

puts doc.xpath('//*[not(*)]').length
#=> 3

doc.xpath('//*[not(*)]').each do |e|
    puts e.text
end
#=> "text div (leaf)"
#=> "text paragraph (leaf)"
#=> "text paragraph 2 (leaf)"

How to select only leaf nodes with Nokogiri?

Tags:

ruby

xpath

nokogiri

Luccas

1 Answers

Justin Ko

Recent Activity

Donate For Us

How to select only leaf nodes with Nokogiri?

Tags:

ruby

xpath

nokogiri

Luccas

1 Answers

Justin Ko

Related questions

Recent Activity

Donate For Us