Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster api than javax.xml.xpath to parse the xml for a value?

Tags:

java

xml

I am using javax.xml.xpath to search for specific strings in xml files, however due to the huge number of xml files which needs to be searched this is turning out to be much slower than expected.

Is there any api that java supports that is faster than javax.xml.xpath or which is the fastest that is available?

like image 660
Nohsib Avatar asked Jun 24 '11 17:06

Nohsib


2 Answers

As pointed out by skaffman you will want to be sure you are using the javax.xml.xpath libraries as efficiently as possible. If you are executing an XPath statement more that once you will want to make sure to compile it into an XPathExpression.

XPathExpression xPathExpression = xPath.compile("/root/device/modelname");
nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);

Demo

In the example option #2 will be faster than option #1.

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class Demo {

    public static void main(String[] args) {
        DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
        try {
            DocumentBuilder builder = domFactory.newDocumentBuilder();
            File xml = new File("input.xml");
            Document dDoc = builder.parse(xml);

            NodeList nl;

            // OPTION #1
            XPath xPath = XPathFactory.newInstance().newXPath();
            nl = (NodeList) xPath.evaluate("root/device/modelname", dDoc, XPathConstants.NODESET);
            printResults(nl);
            nl = (NodeList) xPath.evaluate("/root/device/modelname", dDoc, XPathConstants.NODESET);
            printResults(nl);

            // OPTION #2
            XPathExpression xPathExpression = xPath.compile("/root/device/modelname");
            nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
            printResults(nl);
            nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
            printResults(nl);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void printResults(NodeList nl) {
        for(int x=0; x<nl.getLength(); x++) {
            System.out.println("the value is: " + nl.item(x).getTextContent());
        }
    }

}

input.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <blah>foo</blah>
  <device>
    <modelname>xbox</modelname>
  </device>
  <blah>bar</blah>
  <device>
    <modelname>wii</modelname>
  </device>
  <blah/>
</root>
like image 142
bdoughan Avatar answered Sep 19 '22 23:09

bdoughan


I wonder if the XPath searching is really your bottleneck, or whether it's actually the XML parsing? I would suspect the latter. I don't know how persistent your XML documents are, but I would think the solution is to store them in an XML database so you only incur the parsing cost once, and so that they can be indexed to make XPath/XQuery searching more efficient.

like image 24
Michael Kay Avatar answered Sep 21 '22 23:09

Michael Kay