As part of a Java 6 application, I want to find all namespace declarations in an XML document, including any duplicates.
Edit: Per Martin's request, here's the Java code I am using:
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPathExpression = xPath.compile("//namespace::*");
NodeList nodeList = (NodeList) xPathExpression.evaluate(xmlDomDocument, XPathConstants.NODESET);
Suppose I have this XML document:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
<ele:one>a</ele:one>
<two att:c="d">e</two>
<three>txt:f</three>
</root>
To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:
//namespace::*
It finds 4 namespace declarations, which is what I expect (and desire):
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
But if I change to using xPath 2.0, then I get 16 namespace declarations (each of the previous declarations 4 times), which is not what I expect (or desire):
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
This same difference is seen even when I use the non-abbreviated version of the xPath statement:
/descendant-or-self::node()/namespace::*
And it is seen across a variety of XML parsers (LIBXML, MSXML.NET, Saxon) as tested in oXygen. (Edit: As I mention later in the comments, this statement is not true. Though I thought I was testing a variety of XML parsers, I really wasn't.)
Question #1: Why the difference from xPath 1.0 to xPath 2.0?
Question #2: Is it possible/reasonable to get desired results using xPath 2.0?
Hint: Using the distinct-values()
function in xPath 2.0 will not return the desired results, as I want all namespace declarations, even if the same namespace is declared twice. For example, consider this XML document:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<bar:one xmlns:bar="http://www.bar.com">alpha</bar:one>
<bar:two xmlns:bar="http://www.bar.com">bravo</bar:two>
</root>
The desired result is:
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/bar:one[1]/@xmlns:bar - http://www.bar.com
/root[1]/bar:two[1]/@xmlns:bar - http://www.bar.com
I think this will get all namespaces, without any duplicates:
for $i in 1 to count(//namespace::*) return
if (empty(index-of((//namespace::*)[position() = (1 to ($i - 1))][name() = name((//namespace::*)[$i])], (//namespace::*)[$i])))
then (//namespace::*)[$i]
else ()
To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:
//namespace::* It finds 4 namespace declarations, which is what I expect (and desire): /root[1]/@xmlns:att - attribute.com /root[1]/@xmlns:ele - element.com /root[1]/@xmlns:txt - textnode.com /root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
You are using a non-compliant (buggy) XPath 1.0 implementation.
I get different and correct results with all XSLT 1.0 processors I have. This transformation (just evaluating the XPath expression and printing one line for each selected namespace node):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="//namespace::*">
<xsl:value-of select="concat(name(), ': ', ., '
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
<ele:one>a</ele:one>
<two att:c="d">e</two>
<three>txt:f</three>
</root>
produces a correct result:
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
with all of these XSLT 1.0 and XSLT 2.0 processors:
MSXML3, MSXML4, MSXML6, .NET XslCompiledTransform, .NET XslTransform, Altova (XML SPY), Saxon 6.5.4, Saxon 9.1.07, XQSharp.
Here is a short C# program that confirms the number of nodes selected in .NET is 16:
namespace TestNamespaces
{
using System;
using System.IO;
using System.Xml.XPath;
class Test
{
static void Main(string[] args)
{
string xml =
@"<root xmlns:ele='element.com' xmlns:att='attribute.com' xmlns:txt='textnode.com'>
<ele:one>a</ele:one>
<two att:c='d'>e</two>
<three>txt:f</three>
</root>";
XPathDocument doc = new XPathDocument(new StringReader(xml));
double count =
(double) doc.CreateNavigator().Evaluate("count(//namespace::*)");
Console.WriteLine(count);
}
}
}
The result is:
16
.
UPDATE:
This is an XPath 2.0 expression that finds just the "distinct" namespace nodes and produces a line of name - value pairs for each of them:
for $i in distinct-values(
for $ns in //namespace::*
return
index-of(
(for $x in //namespace::*
return
concat(name($x), ' ', string($x))
),
concat(name($ns), ' ', string($ns))
)
[1]
)
return
for $x in (//namespace::*)[$i]
return
concat(name($x), ' :', string($x), '
')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With