Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl libXML find node by attribute value

Tags:

perl

libxml2

I have very large XML document that I am iterating through. The XML's use mostly attributes rather than node values. I may need to find numerous nodes in the file to piece together one grouping of information. They are tied together via different ref tag values. Currently each time I need to locate one of the nodes to extract data from I am looping through the entire XML and doing a match on the attribute to find the correct node. Is there a more efficient way to just select a node of a given attribute value instead of constantly looping and compare? My current code is so slow it is almost useless.

Currently I am doing something like this numerous times in the same file for numerous different nodes and attribute combinations.

my $searchID = "1234";
foreach my $nodes ($xc->findnodes('/plm:PLMXML/plm:ExternalFile')) {
    my $ID      = $nodes->findvalue('@id');
    my $File    = $nodes->findvalue('@locationRef');
    if ( $searchID eq $ID ) {
        print "The File Name = $File\n";
    }
}

In the above example I am looping and using an "if" to compare for an ID match. I was hoping I could do something like this below to just match the node by attribute instead... and would it be any more efficient then looping?

my $searchID = "1234";
$nodes = ($xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id=$searchID]'));
my $File    = $nodes->findvalue('@locationRef');
print "The File Name = $File\n";
like image 240
Brian Avatar asked Jan 08 '23 00:01

Brian


2 Answers

Do one pass to extract the information you need into a more convenient format or to build an index.

my %nodes_by_id;
for my $node ($xc->findnodes('//*[@id]')) {
    $nodes_by_id{ $node->getAttribute('id') } = $node;
}

Then your loops become

my $node = $nodes_by_id{'1234'};

(And stop using findvalue instead of getAttribute.)

like image 50
ikegami Avatar answered Jan 19 '23 03:01

ikegami


If you will be doing this for lots of IDs, then ikegami's answer is worth reading.

I was hoping I could do something like this below to just match the node by attribute instead

...

$nodes = ($xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id=$searchID]'));

Sort of.

For a given ID, yes, you can do

$nodes = $xc->findnodes("/plm:PLMXML/plm:ExternalFile[\@id=$searchID]");

... provided that $searchID is known to be numeric. Notice the double quotes in perl means the variables interpolate, so you should escape the @id because that is part of the literal string, not a perl array, whereas you want the value of $searchID to become part of the xpath string, so it is not escaped.

Note also that in this case you are asking for it in scalar context will have a XML::LibXML::Nodelist object, not the actual node, nor an arrayref; for the latter you will need to use square brackets instead of round ones as I have done in the next example.

Alternatively, if your search id may not be numeric but you know for sure that it is safe to be put in an XPath string (e.g. doesn't have any quotes), you can do the following:

$nodes = [ $xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id="' . $searchID . '"]') ];
print $nodes->[0]->getAttribute('locationRef'); # if you're 100% sure it exists

Notice here that the resulting string will enclose the value in quotation marks.

Finally, it is possible to skip straight to:

print $xc->findvalue('/plm:PLMXML/plm:ExternalFile[@id="' . $searchID . '"]/@locationRef');

... providing you know that there is only one node with that id.

like image 24
user52889 Avatar answered Jan 19 '23 04:01

user52889