Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to find the relative XPath between two elements?

Tags:

dom

xml

perl

xpath

Having looked at various popular modules for working with XML / XPath I have yet to see a straight-forward way to achieve this.

Essentially the interface would look something like:

my $xpath = get_path($node1, $node2);

...which would return the relative path from $node1 to $node2.

I include my own time in the calculation of 'efficiency' - I'll take any existing solution for this problem. Failing that, I'd like to know some of the pitfalls one might come up against in any 'obvious' home-grown solutions.

Off the top of my head I could imagine simply first searching for $node2 in $node1's descendants, then failing that iterate up $node1's ancestors doing the same thing. Would that be as raucously resource-intensive as I fear?

For my particular use-case, I can assume the absolute paths of both $node1 and $node2 are known. Given that, I would like to think there's some 'XPath math' that could be done between the two full paths without having to run about all over the tree, but I don't know what that process would look like.

To summarise:

1) Do any existing CPAN modules make what I want to do easy?

2) If not, what's an efficient way to go about it?

like image 836
Ryan Jendoubi Avatar asked Aug 16 '11 17:08

Ryan Jendoubi


1 Answers

Find the absolute path for both nodes.

ref:    root foo bar[2] baz[1] moo
target: root foo bar[2] baz[2] moo

Remove common leading segments.

ref:    baz[1] moo
target: baz[2] moo

For each segment in the reference, prepend the target with a .. segment.

.. .. baz[2] moo

Convert to XPath.

../../baz[2]/moo

Code:

use XML::LibXML qw( XML_ATTRIBUTE_NODE XML_ELEMENT_NODE );

sub get_path_segs {
   my ($node) = @_;
   my @path = split(/\//, $node->nodePath());
   shift(@path);
   return @path;
}

sub get_path {
   my ($ref, $targ) = @_;

   die if $ref->nodeType()  != XML_ELEMENT_NODE && $ref->nodeType()  != XML_ATTRIBUTE_NODE;
   die if $targ->nodeType() != XML_ELEMENT_NODE && $targ->nodeType() != XML_ATTRIBUTE_NODE;

   my @ref  = get_path_segs($ref);
   my @targ = get_path_segs($targ);

   while (@ref && @targ && $ref[0] eq $targ[0]) {
      shift(@ref);
      shift(@targ);
   }

   while (@ref) {
      pop(@ref);
      unshift(@targ, '..');
   }

   return @targ ? join('/', @targ) : '.';
}

It currently supports element and attribute nodes. It could be expanded to support other node types, possibly trivially.

like image 163
ikegami Avatar answered Nov 15 '22 07:11

ikegami