Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list XML node attributes with XML::LibXML?

Given the following XML snippet:

<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

How do I get this output?

outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2

I have looked into use XML::LibXML::Reader;, but that module appears to only provide access to attribute values referenced by their names. And how do I get the list of attribute names in the first place?

like image 211
Alexander Shcheblikin Avatar asked Nov 07 '14 07:11

Alexander Shcheblikin


People also ask

What is a nodelist object in XML?

An XML::LibXML::NodeList object contains an ordered list of nodes, as detailed by the W3C DOM documentation of Node Lists. ... You will almost never have to create a new NodeList object, as it is all done for you by XPath. Returns a list of nodes, the contents of the node list, as a perl list. Returns the string-value of the first node in the list.

What is a node in XML DOM?

According to the XML DOM, everything in an XML document is a node: The entire document is a document node. Every XML element is an element node. The text in the XML elements are text nodes. Every attribute is an attribute node.

What is the difference between node and attribute in XML?

Every XML element is an element node. The text in the XML elements are text nodes. Every attribute is an attribute node. Comments are comment nodes.

How to test what kind of node is handled by XML::libXML?

Because XML::LibXML does not implement namespace declarations and attributes the same way, it is required to test what kind of node is handled while accessing the functions result. If this function is called in array context the attribute nodes are returned as an array. In scalar context, the function will return a XML::LibXML::NamedNodeMap object.


2 Answers

Something like this should help you.

It's not clear from your question whether <outline> is the root element of the data, or if it is buried somewhere in a bigger document. It's also unclear how general you want the solution to be - e.g. do you want the entire document dumped in this manner?

Anyway, this program generates the output you requested from the given XML input in a fairly concise manner.

use strict;
use warnings;
use 5.014;     #' For /r non-destructive substitution mode

use XML::LibXML;

my $xml = XML::LibXML->load_xml(IO => \*DATA);

my ($node) = $xml->findnodes('//outline');

print $node->nodeName, "\n";

for my $child ($node->getChildrenByTagName('*')) {
  my $name = $child->nodeName;

  printf "%s=%s\n", $name, $child->textContent =~ s/\A\s+|\s+\z//gr;

  for my $attr ($child->attributes) {
    printf "%s %s=%s\n", $name, $attr->getName, $attr->getValue;
  }
}

__DATA__
<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

output

outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2
like image 87
Borodin Avatar answered Oct 11 '22 21:10

Borodin


You find the list of attributes by doing $e->findnodes( "./@*");

Below is a solution, with plain XML::LibXML, not XML::LibXML::Reader, that works with your test data. It may be sensitive to extra whitespace and mixed-content though, so test it on real data before using it.

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;

my $dom= XML::LibXML->load_xml( IO => \*DATA);
my $e= $dom->findnodes( "//*");

foreach my $e (@$e)
  { print $e->nodeName;

    # text needs to be trimmed or line returns show up in the output
    my $text= $e->textContent;
    $text=~s{^\s*}{};
    $text=~s{\s*$}{};

    if( ! $e->getChildrenByTagName( '*') && $text)
      { print "=$text"; }
    print "\n"; 

    my @attrs= $e->findnodes( "./@*");
    # or, as suggested by Borodin below, $e->attributes

    foreach my $attr (@attrs)
      { print $e->nodeName, " ", $attr->nodeName. "=", $attr->value, "\n"; }
  }
__END__
<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>
like image 38
mirod Avatar answered Oct 11 '22 20:10

mirod