Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all namespace declarations in an XML document - xPath 1.0 vs xPath 2.0

As part of a Java 6 application, I want to find all namespace declarations in an XML document, including any duplicates.

Edit: Per Martin's request, here's the Java code I am using:

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPathExpression = xPath.compile("//namespace::*"); 
NodeList nodeList = (NodeList) xPathExpression.evaluate(xmlDomDocument, XPathConstants.NODESET);

Suppose I have this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::*

It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

But if I change to using xPath 2.0, then I get 16 namespace declarations (each of the previous declarations 4 times), which is not what I expect (or desire):

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com

This same difference is seen even when I use the non-abbreviated version of the xPath statement:

/descendant-or-self::node()/namespace::*

And it is seen across a variety of XML parsers (LIBXML, MSXML.NET, Saxon) as tested in oXygen. (Edit: As I mention later in the comments, this statement is not true. Though I thought I was testing a variety of XML parsers, I really wasn't.)

Question #1: Why the difference from xPath 1.0 to xPath 2.0?

Question #2: Is it possible/reasonable to get desired results using xPath 2.0?

Hint: Using the distinct-values() function in xPath 2.0 will not return the desired results, as I want all namespace declarations, even if the same namespace is declared twice. For example, consider this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <bar:one xmlns:bar="http://www.bar.com">alpha</bar:one>
    <bar:two xmlns:bar="http://www.bar.com">bravo</bar:two>
</root>

The desired result is:

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/bar:one[1]/@xmlns:bar - http://www.bar.com
/root[1]/bar:two[1]/@xmlns:bar - http://www.bar.com
like image 346
james.garriss Avatar asked Apr 18 '12 12:04

james.garriss


2 Answers

I think this will get all namespaces, without any duplicates:

for $i in 1 to count(//namespace::*) return 
if (empty(index-of((//namespace::*)[position() = (1 to ($i - 1))][name() = name((//namespace::*)[$i])], (//namespace::*)[$i]))) 
then (//namespace::*)[$i] 
else ()
like image 56
Roger Costello Avatar answered Oct 06 '22 08:10

Roger Costello


To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::* It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com 
/root[1]/@xmlns:txt - textnode.com 
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

You are using a non-compliant (buggy) XPath 1.0 implementation.

I get different and correct results with all XSLT 1.0 processors I have. This transformation (just evaluating the XPath expression and printing one line for each selected namespace node):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:for-each select="//namespace::*">
       <xsl:value-of select="concat(name(), ': ', ., '&#xA;')"/>
     </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

produces a correct result:

xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com

with all of these XSLT 1.0 and XSLT 2.0 processors:

MSXML3, MSXML4, MSXML6, .NET XslCompiledTransform, .NET XslTransform, Altova (XML SPY), Saxon 6.5.4, Saxon 9.1.07, XQSharp.

Here is a short C# program that confirms the number of nodes selected in .NET is 16:

namespace TestNamespaces
{
    using System;
    using System.IO;
    using System.Xml.XPath;

    class Test
    {
        static void Main(string[] args)
        {
            string xml =
@"<root xmlns:ele='element.com' xmlns:att='attribute.com' xmlns:txt='textnode.com'>
    <ele:one>a</ele:one>
    <two att:c='d'>e</two>
    <three>txt:f</three>
</root>";
            XPathDocument doc = new XPathDocument(new StringReader(xml));

            double count = 
              (double) doc.CreateNavigator().Evaluate("count(//namespace::*)");

            Console.WriteLine(count);
        }
    }
}

The result is:

16.

UPDATE:

This is an XPath 2.0 expression that finds just the "distinct" namespace nodes and produces a line of name - value pairs for each of them:

for $i in distinct-values(
             for $ns in //namespace::*
               return
                  index-of(
                           (for $x in //namespace::*
                             return
                                concat(name($x), ' ', string($x))

                            ),
                            concat(name($ns), ' ', string($ns))
                          )
                          [1]
                                                  )
  return
    for $x in (//namespace::*)[$i]
     return
        concat(name($x), ' :', string($x), '&#xA;')
like image 37
Dimitre Novatchev Avatar answered Oct 06 '22 08:10

Dimitre Novatchev