I have the following sample XML file:
<a xmlns="http://www.foo.com">
<b>
</b>
</a>
Using the XPath
expression /foo:a/foo:b
(with 'foo'
properly configured in the NamespaceContext
) I can correctly count the number of b
nodes and the code works both when Saxon-HE-9.4.jar
is on the CLASSPATH and when it's not.
When, however, I parse the same file with a namespace-unaware DocumentBuilderFactory
, the XPath expression "/a/b" correctly counts the number of b
nodes only when Saxon-HE-9.4.jar
is not on the CLASSPATH.
Code below:
import java.io.*;
import java.util.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import javax.xml.namespace.NamespaceContext;
public class FooMain {
public static void main(String args[]) throws Exception {
String xmlSample = "<a xmlns=\"http://www.foo.com\"><b></b></a>";
{
XPath xpath = namespaceUnawareXpath();
System.out.printf("[NS-unaware] Number of 'b' nodes is: %d\n",
((NodeList) xpath.compile("/a/b").evaluate(stringToXML(xmlSample, false),
XPathConstants.NODESET)).getLength());
}
{
XPath xpath = namespaceAwareXpath("foo", "http://www.foo.com");
System.out.printf("[NS-aware ] Number of 'b' nodes is: %d\n",
((NodeList) xpath.compile("/foo:a/foo:b").evaluate(stringToXML(xmlSample, true),
XPathConstants.NODESET)).getLength());
}
}
public static XPath namespaceUnawareXpath() {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
return xpath;
}
public static XPath namespaceAwareXpath(final String prefix, final String nsURI) {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
NamespaceContext ctx = new NamespaceContext() {
@Override
public String getNamespaceURI(String aPrefix) {
if (aPrefix.equals(prefix))
return nsURI;
else
return null;
}
@Override
public Iterator getPrefixes(String val) {
throw new UnsupportedOperationException();
}
@Override
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
};
xpath.setNamespaceContext(ctx);
return xpath;
}
private static Document stringToXML(String s, boolean nsAware) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(nsAware);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(s.getBytes("UTF-8")));
}
}
Running the above with:
java -classpath dist/foo.jar FooMain
.. produces:
[NS-unaware] Number of 'b' nodes is: 1
[NS-aware ] Number of 'b' nodes is: 1
Running with:
java -classpath Saxon-HE-9.4.jar:dist/foo.jar FooMain
... produces:
[NS-unaware] Number of 'b' nodes is: 0
[NS-aware ] Number of 'b' nodes is: 1
Correct observation. Saxon doesn't work with a namespace-unaware DOM. There's no reason why it should. If you can find an XSLT/XPath processor that works with a namespace-unaware DOM, then go ahead and use it if you want, but its behaviour isn't defined by any standard.
If it were possible for Saxon to detect that the DOM is namespace-unaware, then it would throw an error rather than giving spurious results. Sadly, one of DOM's many design failings is that if you didn't create the DOM yourself, you can't tell whether it's namespace-aware or not.
Your comment "I need to be lenient on namespaces since I have to handle 3rd-party XML instances that are not always XSD valid." is a complete non-sequitur. It's true that a document can't be XSD-valid unless it is namespace-valid, but the converse is not true; loads of documents are namespace-valid without being XSD-valid.
Finally, as your experience shows, relying on the JAXP mechanism to load whatever XPath processor happens to be lying around on the classpath is very error-prone. You can't even control whether you get an XPath 1.0 or 2.0 processor by this mechanism (and again, you can't find out easily which you have got). If your code is dependent on the quirks of a particular XPath implementation then you need to load that implementation explicitly rather than relying on the JAXP search.
UPDATE (Sep 2015): Saxon 9.6 no longer includes the meta-inf services file that advertises it as a JAXP XPath provider. This means you will never pick up Saxon as your XPath processor simply because it is on the classpath: you have to ask for it explicitly.
The XPath language is only defined on namespace-well-formed XML, so the behaviour of different processors on a non-namespace-aware DOM tree (even one like <a><b/></a>
that, had it been parsed in a namespace-aware manner, would not actually use any namespaces) is at best implementation-specific and at worst completely undefined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With