Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fastest way to query xml in java [duplicate]

What is the fastest way to query a huge XML file in java,

DOM - xpath : this is taking lot of time,

     DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
     docBuilderFactory.setNamespaceAware(true);

     DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
     Document document = docBuilder.parse(new File("test.xml"));

     XPath xpath = XPathFactory.newInstance().newXPath();

     String xPath = "/*/*[@id='ABCD']/*/*";

     XPathExpression expr = xpath.compile(xPath);
     //this line takes lot of time
     NodeList result = (NodeList)expr.evaluate(document, XPathConstants.NODESET);

with last line in code, program finishes in 40 secs and without it in 1 second.

SAX : I don't know if this can be used for query, on internet I am only able to find the examples of parsing.

What are the other options to make query faster, the size of my xml file is around 5MB. Thnx

like image 218
Mahender Singh Avatar asked Oct 04 '22 18:10

Mahender Singh


1 Answers

If your id attributes are of type xs:ID and you have an XML schema for your document then you can use the Document.getElementById(String) method. I will demonstrate below with an example.

XML Schema

<?xml version="1.0" encoding="UTF-8"?>
<schema 
    xmlns="http://www.w3.org/2001/XMLSchema" 
    targetNamespace="http://www.example.org/schema" 
    xmlns:tns="http://www.example.org/schema" 
    elementFormDefault="qualified">

    <element name="foo">
        <complexType>
            <sequence>
                <element ref="tns:bar" maxOccurs="unbounded"/>
            </sequence>
        </complexType>
    </element>

    <element name="bar">
        <complexType>
            <attribute name="id" type="ID"/>
        </complexType>
    </element>

</schema>

XML Input (input.xml)

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="http://www.example.org/schema">
    <bar id="ABCD"/>
    <bar id="EFGH"/>
    <bar id="IJK"/>
</foo>

Demo

You will need to set the instance of Schema on the DocumentBuilderFactory to get everything to work.

import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.*;
import javax.xml.validation.*;
import org.w3c.dom.*;

public class Demo {

    public static void main(String[] args) throws Exception {
        SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema schema = sf.newSchema(new File("src/forum17250259/schema.xsd"));

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        dbf.setSchema(schema);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document document = db.parse(new File("src/forum17250259/input.xml"));

        Element result = document.getElementById("EFGH");
        System.out.println(result);
    }

}
like image 84
bdoughan Avatar answered Oct 12 '22 10:10

bdoughan