I'm attempting to select a node using an XPath query and I don't understand why XML::LibXML doesn't find the node when it has an xmlns atribute. Here's a script to demonstrate the issue:
#!/usr/bin/perl
use XML::LibXML; # 1.70 on libxml2 from libxml2-dev 2.6.16-7sarge1 (don't ask)
use XML::XPath; # 1.13
use strict;
use warnings;
use v5.8.4; # don't ask
my ($xpath, $libxml, $use_namespace) = @ARGV;
my $xml = sprintf(<<'END_XML', ($use_namespace ? 'xmlns="http://www.w3.org/2000/xmlns/"' : q{}));
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
<MyContainer %s>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
</RootElement>
END_XML
my $xml_parser
= $libxml ? XML::LibXML->load_xml(string => $xml, keep_blanks => 1)
: XML::XPath->new(xml => $xml);
my $nodecount = 0;
foreach my $node ($xml_parser->findnodes($xpath)) {
$nodecount ++;
print "--NODE $nodecount--\n"; #would use say on newer perl
print $node->toString($libxml && 1), "\n";
}
unless ($nodecount) {
print "NO NODES FOUND\n";
}
This script allows you to chose between the XML::LibXML parser and the XML::XPath parser. It also allows you to define an xmlns attribute on the MyContainer element or leave it off depending on the arguments passed.
The xpath expression I'm using is "RootElement/MyContainer". When I run the query using the XML::LibXML parser without the namespace it finds the node with no problem:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml
--NODE 1--
<MyContainer>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
However, when I run it with the namespace in place it finds no nodes:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml use_namespace
NO NODES FOUND
Contrast this with the output when using the XMLL::XPath parser:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 # no namespace
--NODE 1--
<MyContainer>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 1 # with namespace
--NODE 1--
<MyContainer xmlns="http://www.w3.org/2000/xmlns/">
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
Which of these parser implementations is doing it "right"? Why does XML::LibXML treat it differently when I use a namespace? What can I do to retrieve the node when the namespace is in place?
XPath Queries and Namespaces. XPath queries are aware of namespaces in an XML document and can use namespace prefixes to qualify element and attribute names. Qualifying element and attribute names with a namespace prefix limits the nodes returned by an XPath query to only those nodes that belong to a specific namespace.
It works! Namespace prefix "x" now refers to the default namespace in the document. There is a way to specify the namespace URI as part of the XPath, without using this prefix feature, but the resulting XPaths are enormous and hard for humans to read.
XPath is a major element in the XSLT standard. XPath can be used to navigate through elements and attributes in an XML document.
The XmlNamespaceManager object may be used in the query in each of the following ways. The XmlNamespaceManager object is associated with an existing XPathExpression object by using the SetContext method of the XPathExpression object.
This is a FAQ. XPath considers any unprefixed name in an expression to belong to "no namespace".
Then, the expression:
RootElement/MyContainer
selects all MyContainer
elements that belong to "no namespace" and are children of all RootElement
elements that belong to "no namespace" and are children of the context (current node). However, there are no elements at all in the whole document that belong to "no namespace" -- all elements belong to the default namespace.
This explains the result you are getting. XML::LibXML is right.
The common solution is that the API of the hosting language allows a specific prefix to be bound to the namespace by "registering" a namespace. Then one can use an expression like:
x:RootElement/x:MyContainer
where x
is the prefix with which the namespace has been registered.
In the very rare occasions where the hosting language doesn't offer registering namespaces, use the following expression:
*[name()='RootElement']/*[name()='MyContainer']
@Dmitre is right. You need to take a look at XML::LibXML::XPathContext which will allow you to declare the namespace and then you can use namespace aware XPath statements. I gave an example of using this some time ago on stackoverflow - have a look at Why should I use XPathContext with Perl's XML::LibXML
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With