Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using perl XML::LibXML to parse

Tags:

perl

libxml2

I am using perl's XML::LibXML module to parse an XML response from a device. It appears that the only way I can successfully get my data is by modifying the XML response from the device. Here is my XML response from the device:

<chassis-inventory xmlns="http://xml.juniper.net/junos/10.3D0/junos-chassis">

<chassis junosstyle="inventory">

<name>Chassis</name>

<serial-number>JN111863EAFF</serial-number>

<description>VJX1000</description>

<chassis-module>

<name>Midplane</name>

</chassis-module>

<chassis-module>

<name>System IO</name>

</chassis-module>

<chassis-module>

<name>Routing Engine</name>

<description>VJX1000</description>

<chassis-re-disk-module>

<name>ad0</name>

<disk-size>1953</disk-size>

<model>QEMU HARDDISK</model>

<serial-number>QM00001</serial-number>

<description>Hard Disk</description>

</chassis-re-disk-module>

</chassis-module>

<chassis-module>

<name>FPC 0</name>

<chassis-sub-module>

<name>PIC 0</name>

</chassis-sub-module>

</chassis-module>

<chassis-module>

<name>Power Supply 0</name>

</chassis-module>

</chassis>

</chassis-inventory>

Here is the perl code I am using to parse and find the serial number for example:

#!/bin/env perl
use strict;
use warnings;
use XML::LibXML;
my $f = ("/var/working/xmlstuff");
sub yeah {
my $ff;
my $f = shift;
open(my $fff,$f);
while(<$fff>) {
$_ =~ s/^\s+$//; 
$_ =~ s/^(<\S+)\s.*?=.*?((?:\/)?>)/$1$2/g;
$ff .= $_;
}
close($fff);
return $ff
}
my $tparse = XML::LibXML->new();
my $ss = $tparse->load_xml( string => &yeah($f));
print map $_->to_literal,$ss->findnodes('/chassis-inventory/chassis/serial-number');

If I do not use the regex substitution nothing is loaded for the script to parse. I can understand the stripping of newlines, but why do I have to remove the attributes from the XML response, so it only works if these lines:

<chassis-inventory xmlns="http://xml.juniper.net/junos/10.3D0/junos-chassis">

<chassis junosstyle="inventory">

Become this:

<chassis-inventory>
<chassis>
  1. Is this a problem with the XML response or with the XML::LibXML module?

  2. Is there a way to have it ignore the fact that there is empty lines in the file without using a regex substitution?

Thanks for the help.

like image 201
salparadise Avatar asked Dec 13 '22 10:12

salparadise


1 Answers

The reason your XPATH expression is failing is because of the namespace; you need to search in context to that. Here's an explanation from the XML::libXML documentation:

NOTE ON NAMESPACES AND XPATH:

A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace.

So, for example, one cannot match the root element of an XHTML document with $node->find('/html') since '/html' would only match if the root element had no namespace, but all XHTML elements belong to the namespace http://www.w3.org/1999/xhtml. (Note that xmlns="..." namespace declarations can also be specified in a DTD, which makes the situation even worse, since the XML document looks as if there was no default namespace).

To deal with this, register the namespace, then search your document using the namespace. Here's an example that should work for you:

#!/bin/env perl
use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->load_xml( location => '/var/working/xmlstuff');
my $xpc = XML::LibXML::XPathContext->new($xml);
$xpc->registerNs('x', 'http://xml.juniper.net/junos/10.3D0/junos-chassis');

foreach my $node ($xpc->findnodes('/x:chassis-inventory/x:chassis/x:serial-number')) {

    print $node->textContent() . "\n";
}
like image 85
Joel Avatar answered Dec 28 '22 06:12

Joel