Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you calculate the number of "levels" of descendants of a Nokogiri node?

You can call Nokogiri::XML::Node#ancestors.size to see how deeply a node is nested. But is there a way to determine how deeply nested the most deeply nested child of a node is?

Alternatively, how can you find all the leaf nodes that descend from a node?

like image 999
dan Avatar asked Nov 04 '22 23:11

dan


2 Answers

The following code monkey-patches Nokogiri::XML::Node for fun, but of course you can extract them as individual methods taking a node argument if you like. (Only the height method is part of your question, but I thought the deepest_leaves method might be interesting.)

require 'nokogiri'
class Nokogiri::XML::Node
  def depth
    ancestors.size
    # The following is ~10x slower: xpath('count(ancestor::node())').to_i
  end
  def leaves
    xpath('.//*[not(*)]').to_a
  end
  def height
    tallest = leaves.map{ |leaf| leaf.depth }.max
    tallest ? tallest - depth : 0
  end
  def deepest_leaves
    by_height = leaves.group_by{ |leaf| leaf.depth }
    by_height[ by_height.keys.max ]
  end
end

doc = Nokogiri::XML "<root>
  <a1>
    <b1></b1>
    <b2><c1><d1 /><d2><e1 /><e2 /></d2></c1><c2><d3><e3/></d3></c2></b2>
  </a1>
  <a2><b><c><d><e><f /></e></d></c></b></a2>
</root>"

a1 = doc.at_xpath('//a1')
p a1.height                      #=> 4
p a1.deepest_leaves.map(&:name)  #=> ["e1", "e2", "e3"]
p a1.leaves.map(&:name)          #=> ["b1", "d1", "e1", "e2", "e3"]

Edit: To answer just the question asked tersely, without wrapping it in re-usable pieces:

p a1.xpath('.//*[not(*)]').map{ |n| n.ancestors.size }.max - a1.ancestors.size
like image 85
Phrogz Avatar answered Nov 09 '22 10:11

Phrogz


You can call Nokogiri::XML::Node#ancestors.size to see how deeply a node is nested. But is there a way to determine how deeply nested the most deeply nested child of a node is?

Use:

count(ancestor::node())

This expression expresses the number of ancesstors the context (current) node has in the document hierarchy.

To find the nesting level of the "most deeply nested child" one must first determine all "leaf" nodes:

descendant-or-self::node()[not(node())]

and for each of them get their nesting level using the above XPath expression.

Then the maximum nesting level has to be calculated (the maximum of all numbers produced ), and this last calculation is not possible with pure XPath 1.0.

This is possible to express in a single XPath 2.0 expression:

max(for $leaf in /descendant-or-self::node()[not(node())],
        $depth in count($leaf/ancestor::node())
      return
        $depth
    )

Update:

It is possible to shorten this XPath 2.0 expression even more:

max(/descendant-or-self::node()[not(node())]/count(ancestor::node()))
like image 34
Dimitre Novatchev Avatar answered Nov 09 '22 08:11

Dimitre Novatchev