Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does XML::LibXML find no nodes for this xpath query when using a namespace

I'm attempting to select a node using an XPath query and I don't understand why XML::LibXML doesn't find the node when it has an xmlns atribute. Here's a script to demonstrate the issue:

#!/usr/bin/perl

use XML::LibXML; # 1.70 on libxml2 from libxml2-dev 2.6.16-7sarge1 (don't ask)
use XML::XPath;  # 1.13
use strict;
use warnings;

use v5.8.4; # don't ask

my ($xpath, $libxml, $use_namespace) = @ARGV;

my $xml = sprintf(<<'END_XML', ($use_namespace ? 'xmlns="http://www.w3.org/2000/xmlns/"' : q{}));
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
  <MyContainer %s>
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>
</RootElement>
END_XML

my $xml_parser
    = $libxml ? XML::LibXML->load_xml(string => $xml, keep_blanks => 1)
    :           XML::XPath->new(xml => $xml);

my $nodecount = 0;
foreach my $node ($xml_parser->findnodes($xpath)) {
    $nodecount ++;
    print "--NODE $nodecount--\n"; #would use say on newer perl
    print $node->toString($libxml && 1), "\n";
}

unless ($nodecount) {
    print "NO NODES FOUND\n";
}

This script allows you to chose between the XML::LibXML parser and the XML::XPath parser. It also allows you to define an xmlns attribute on the MyContainer element or leave it off depending on the arguments passed.

The xpath expression I'm using is "RootElement/MyContainer". When I run the query using the XML::LibXML parser without the namespace it finds the node with no problem:

benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml
--NODE 1--
<MyContainer>
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>

However, when I run it with the namespace in place it finds no nodes:

benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml use_namespace
NO NODES FOUND

Contrast this with the output when using the XMLL::XPath parser:

benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 # no namespace
--NODE 1--
<MyContainer>
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 1 # with namespace
--NODE 1--
<MyContainer xmlns="http://www.w3.org/2000/xmlns/">
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>

Which of these parser implementations is doing it "right"? Why does XML::LibXML treat it differently when I use a namespace? What can I do to retrieve the node when the namespace is in place?

like image 489
benrifkah Avatar asked Nov 03 '10 01:11

benrifkah


People also ask

What is XPath query in XML?

XPath Queries and Namespaces. XPath queries are aware of namespaces in an XML document and can use namespace prefixes to qualify element and attribute names. Qualifying element and attribute names with a namespace prefix limits the nodes returned by an XPath query to only those nodes that belong to a specific namespace.

Does the namespace prefix x work with XPath?

It works! Namespace prefix "x" now refers to the default namespace in the document. There is a way to specify the namespace URI as part of the XPath, without using this prefix feature, but the resulting XPaths are enormous and hard for humans to read.

What is XSLT XPath?

XPath is a major element in the XSLT standard. XPath can be used to navigate through elements and attributes in an XML document.

How do I use the xmlnamespacemanager object in a query?

The XmlNamespaceManager object may be used in the query in each of the following ways. The XmlNamespaceManager object is associated with an existing XPathExpression object by using the SetContext method of the XPathExpression object.


2 Answers

This is a FAQ. XPath considers any unprefixed name in an expression to belong to "no namespace".

Then, the expression:

RootElement/MyContainer

selects all MyContainer elements that belong to "no namespace" and are children of all RootElement elements that belong to "no namespace" and are children of the context (current node). However, there are no elements at all in the whole document that belong to "no namespace" -- all elements belong to the default namespace.

This explains the result you are getting. XML::LibXML is right.

The common solution is that the API of the hosting language allows a specific prefix to be bound to the namespace by "registering" a namespace. Then one can use an expression like:

x:RootElement/x:MyContainer

where x is the prefix with which the namespace has been registered.

In the very rare occasions where the hosting language doesn't offer registering namespaces, use the following expression:

*[name()='RootElement']/*[name()='MyContainer']
like image 192
Dimitre Novatchev Avatar answered Oct 06 '22 06:10

Dimitre Novatchev


@Dmitre is right. You need to take a look at XML::LibXML::XPathContext which will allow you to declare the namespace and then you can use namespace aware XPath statements. I gave an example of using this some time ago on stackoverflow - have a look at Why should I use XPathContext with Perl's XML::LibXML

like image 22
Nic Gibson Avatar answered Oct 06 '22 06:10

Nic Gibson