I am using javax.xml.xpath
to search for specific strings in xml files, however due to the huge number of xml files which needs to be searched this is turning out to be much slower than expected.
Is there any api that java supports that is faster than javax.xml.xpath
or which is the fastest that is available?
As pointed out by skaffman you will want to be sure you are using the javax.xml.xpath
libraries as efficiently as possible. If you are executing an XPath statement more that once you will want to make sure to compile it into an XPathExpression
.
XPathExpression xPathExpression = xPath.compile("/root/device/modelname");
nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
Demo
In the example option #2 will be faster than option #1.
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
File xml = new File("input.xml");
Document dDoc = builder.parse(xml);
NodeList nl;
// OPTION #1
XPath xPath = XPathFactory.newInstance().newXPath();
nl = (NodeList) xPath.evaluate("root/device/modelname", dDoc, XPathConstants.NODESET);
printResults(nl);
nl = (NodeList) xPath.evaluate("/root/device/modelname", dDoc, XPathConstants.NODESET);
printResults(nl);
// OPTION #2
XPathExpression xPathExpression = xPath.compile("/root/device/modelname");
nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
printResults(nl);
nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
printResults(nl);
} catch (Exception e) {
e.printStackTrace();
}
}
private static void printResults(NodeList nl) {
for(int x=0; x<nl.getLength(); x++) {
System.out.println("the value is: " + nl.item(x).getTextContent());
}
}
}
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<blah>foo</blah>
<device>
<modelname>xbox</modelname>
</device>
<blah>bar</blah>
<device>
<modelname>wii</modelname>
</device>
<blah/>
</root>
I wonder if the XPath searching is really your bottleneck, or whether it's actually the XML parsing? I would suspect the latter. I don't know how persistent your XML documents are, but I would think the solution is to store them in an XML database so you only incur the parsing cost once, and so that they can be indexed to make XPath/XQuery searching more efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With