How do I get rid of unwanted nodes returned by findnodes from Perl's XML::LibXML module?

Question

Following is just small fraction of the XML I am working on. I want to extract all attributes, tag name and texts under the substree.

<?xml version='1.0' encoding='UTF-8'?>
<Warehouse>
<Equipment id="ABC001" model="TV" version="3_00">
<attributes>
<Location>Chicago</Location>
<Latitude>30.970</Latitude>
<Longitude>-90.723</Longitude>
</attributes>
</Equipment></Warehouse>

I have coded example like this:

#!/usr/bin/perl
use XML::LibXML;
use Data::Dumper;

$parser = XML::LibXML->new();
$Chunk = $parser->parse_file("numone.xml");

@Equipment = $Chunk->findnodes('//Equipment');
foreach $at ($Equipment[0]->getAttributes()) {
    ($na,$nv) = ($at -> getName(),$at -> getValue());
    print "$na => $nv\n";
}

@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
print Dumper(@Attr);

foreach $at (@Attr) {
    ($na,$nv) = ($at->nodeName, $at->textContent);
    print "$na => $nv\n";
}

I am getting the results like this:

id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 10579528)}, 'XML::LibXML::Text' );
$VAR2 = bless( do{\(my $o = 13643928)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 13657192)}, 'XML::LibXML::Text' );
$VAR4 = bless( do{\(my $o = 13011432)}, 'XML::LibXML::Element' );
$VAR5 = bless( do{\(my $o = 10579752)}, 'XML::LibXML::Text' );
$VAR6 = bless( do{\(my $o = 10565696)}, 'XML::LibXML::Element' );
$VAR7 = bless( do{\(my $o = 13046400)}, 'XML::LibXML::Text' );
#text =>

Location => Chicago
#text =>

Latitude => 30.970
#text =>

Longitude => -90.723
#text =>

Extract attributes seem OK, However extracting tag name and text got extra contents. My questions are:

Where are those ::Text element came from?
How do I get rid of those extra elements and #text things?

Thanks,

Borodin · Accepted Answer

First of all you really should use strict and use warnings at the start of your program, and declare all variables at the point of first use with my. This will reveal a lot of simple mistakes and is especially important in programs you are asking for help with.

As you have been told, the XML::LibXML::Text entries are whitespace text nodes. If you want the XML::LibXML parser to ignore then then set the no_blanks option on the parser object.

Also, you would be better off using the more recent load_xml method instead of the outdated parse_file as below

my $parser = XML::LibXML->new(no_blanks => 1);
my $Chunk = $parser->load_xml(location => "numone.xml");

The output from this changed version of the program looks like

id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 7008120)}, 'XML::LibXML::Element' );
$VAR2 = bless( do{\(my $o = 7008504)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 7008144)}, 'XML::LibXML::Element' );
Location => Chicago
Latitude => 30.970
Longitude => -90.723

Greg Bacon · Answer

The extra nodes are text nodes that contain only whitespace, e.g., the newlines between elements. Skip them if you want:

@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
foreach $at (@Attr) {
    ($na,$nv) = ($at->nodeName, $at->textContent);

    next if $na eq "#text";  # skip text nodes between elements

    print "$na => $nv
";
}

Output:

id => ABC001
model => TV
version => 3_00
Location => Chicago
Latitude => 30.970
Longitude => -90.723

How do I get rid of unwanted nodes returned by findnodes from Perl's XML::LibXML module?

Tags:

xml

perl

libxml2

mkt2012

2 Answers

Borodin

Greg Bacon

Recent Activity

Donate For Us

How do I get rid of unwanted nodes returned by findnodes from Perl's XML::LibXML module?

Tags:

xml

perl

libxml2

mkt2012

2 Answers

Borodin

Greg Bacon

Related questions

Recent Activity

Donate For Us