Given the following XML snippet:
<outline>
<node1 attribute1="value1" attribute2="value2">
text1
</node1>
</outline>
How do I get this output?
outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2
I have looked into use XML::LibXML::Reader;
, but that module appears to only provide access to attribute values referenced by their names. And how do I get the list of attribute names in the first place?
An XML::LibXML::NodeList object contains an ordered list of nodes, as detailed by the W3C DOM documentation of Node Lists. ... You will almost never have to create a new NodeList object, as it is all done for you by XPath. Returns a list of nodes, the contents of the node list, as a perl list. Returns the string-value of the first node in the list.
According to the XML DOM, everything in an XML document is a node: The entire document is a document node. Every XML element is an element node. The text in the XML elements are text nodes. Every attribute is an attribute node.
Every XML element is an element node. The text in the XML elements are text nodes. Every attribute is an attribute node. Comments are comment nodes.
Because XML::LibXML does not implement namespace declarations and attributes the same way, it is required to test what kind of node is handled while accessing the functions result. If this function is called in array context the attribute nodes are returned as an array. In scalar context, the function will return a XML::LibXML::NamedNodeMap object.
Something like this should help you.
It's not clear from your question whether <outline>
is the root element of the data, or if it is buried somewhere in a bigger document. It's also unclear how general you want the solution to be - e.g. do you want the entire document dumped in this manner?
Anyway, this program generates the output you requested from the given XML input in a fairly concise manner.
use strict;
use warnings;
use 5.014; #' For /r non-destructive substitution mode
use XML::LibXML;
my $xml = XML::LibXML->load_xml(IO => \*DATA);
my ($node) = $xml->findnodes('//outline');
print $node->nodeName, "\n";
for my $child ($node->getChildrenByTagName('*')) {
my $name = $child->nodeName;
printf "%s=%s\n", $name, $child->textContent =~ s/\A\s+|\s+\z//gr;
for my $attr ($child->attributes) {
printf "%s %s=%s\n", $name, $attr->getName, $attr->getValue;
}
}
__DATA__
<outline>
<node1 attribute1="value1" attribute2="value2">
text1
</node1>
</outline>
output
outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2
You find the list of attributes by doing $e->findnodes( "./@*");
Below is a solution, with plain XML::LibXML, not XML::LibXML::Reader, that works with your test data. It may be sensitive to extra whitespace and mixed-content though, so test it on real data before using it.
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $dom= XML::LibXML->load_xml( IO => \*DATA);
my $e= $dom->findnodes( "//*");
foreach my $e (@$e)
{ print $e->nodeName;
# text needs to be trimmed or line returns show up in the output
my $text= $e->textContent;
$text=~s{^\s*}{};
$text=~s{\s*$}{};
if( ! $e->getChildrenByTagName( '*') && $text)
{ print "=$text"; }
print "\n";
my @attrs= $e->findnodes( "./@*");
# or, as suggested by Borodin below, $e->attributes
foreach my $attr (@attrs)
{ print $e->nodeName, " ", $attr->nodeName. "=", $attr->value, "\n"; }
}
__END__
<outline>
<node1 attribute1="value1" attribute2="value2">
text1
</node1>
</outline>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With