Following is just small fraction of the XML I am working on. I want to extract all attributes, tag name and texts under the substree.
<?xml version='1.0' encoding='UTF-8'?>
<Warehouse>
<Equipment id="ABC001" model="TV" version="3_00">
<attributes>
<Location>Chicago</Location>
<Latitude>30.970</Latitude>
<Longitude>-90.723</Longitude>
</attributes>
</Equipment></Warehouse>
I have coded example like this:
#!/usr/bin/perl
use XML::LibXML;
use Data::Dumper;
$parser = XML::LibXML->new();
$Chunk = $parser->parse_file("numone.xml");
@Equipment = $Chunk->findnodes('//Equipment');
foreach $at ($Equipment[0]->getAttributes()) {
($na,$nv) = ($at -> getName(),$at -> getValue());
print "$na => $nv\n";
}
@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
print Dumper(@Attr);
foreach $at (@Attr) {
($na,$nv) = ($at->nodeName, $at->textContent);
print "$na => $nv\n";
}
I am getting the results like this:
id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 10579528)}, 'XML::LibXML::Text' );
$VAR2 = bless( do{\(my $o = 13643928)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 13657192)}, 'XML::LibXML::Text' );
$VAR4 = bless( do{\(my $o = 13011432)}, 'XML::LibXML::Element' );
$VAR5 = bless( do{\(my $o = 10579752)}, 'XML::LibXML::Text' );
$VAR6 = bless( do{\(my $o = 10565696)}, 'XML::LibXML::Element' );
$VAR7 = bless( do{\(my $o = 13046400)}, 'XML::LibXML::Text' );
#text =>
Location => Chicago
#text =>
Latitude => 30.970
#text =>
Longitude => -90.723
#text =>
Extract attributes seem OK, However extracting tag name and text got extra contents. My questions are:
::Text element came from? #text things?Thanks,
First of all you really should use strict and use warnings at the start of your program, and declare all variables at the point of first use with my. This will reveal a lot of simple mistakes and is especially important in programs you are asking for help with.
As you have been told, the XML::LibXML::Text entries are whitespace text nodes. If you want the XML::LibXML parser to ignore then then set the no_blanks option on the parser object.
Also, you would be better off using the more recent load_xml method instead of the outdated parse_file as below
my $parser = XML::LibXML->new(no_blanks => 1);
my $Chunk = $parser->load_xml(location => "numone.xml");
The output from this changed version of the program looks like
id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 7008120)}, 'XML::LibXML::Element' );
$VAR2 = bless( do{\(my $o = 7008504)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 7008144)}, 'XML::LibXML::Element' );
Location => Chicago
Latitude => 30.970
Longitude => -90.723
The extra nodes are text nodes that contain only whitespace, e.g., the newlines between elements. Skip them if you want:
@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
foreach $at (@Attr) {
($na,$nv) = ($at->nodeName, $at->textContent);
next if $na eq "#text"; # skip text nodes between elements
print "$na => $nv\n";
}
Output:
id => ABC001 model => TV version => 3_00 Location => Chicago Latitude => 30.970 Longitude => -90.723
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With