What is the fastest way to query a huge XML file in java,
DOM - xpath : this is taking lot of time,
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("test.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
String xPath = "/*/*[@id='ABCD']/*/*";
XPathExpression expr = xpath.compile(xPath);
//this line takes lot of time
NodeList result = (NodeList)expr.evaluate(document, XPathConstants.NODESET);
with last line in code, program finishes in 40 secs and without it in 1 second.
SAX : I don't know if this can be used for query, on internet I am only able to find the examples of parsing.
What are the other options to make query faster, the size of my xml file is around 5MB. Thnx
If your id
attributes are of type xs:ID
and you have an XML schema for your document then you can use the Document.getElementById(String)
method. I will demonstrate below with an example.
XML Schema
<?xml version="1.0" encoding="UTF-8"?>
<schema
xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.org/schema"
xmlns:tns="http://www.example.org/schema"
elementFormDefault="qualified">
<element name="foo">
<complexType>
<sequence>
<element ref="tns:bar" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="bar">
<complexType>
<attribute name="id" type="ID"/>
</complexType>
</element>
</schema>
XML Input (input.xml)
<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="http://www.example.org/schema">
<bar id="ABCD"/>
<bar id="EFGH"/>
<bar id="IJK"/>
</foo>
Demo
You will need to set the instance of Schema
on the DocumentBuilderFactory
to get everything to work.
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.*;
import javax.xml.validation.*;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new File("src/forum17250259/schema.xsd"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setSchema(schema);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new File("src/forum17250259/input.xml"));
Element result = document.getElementById("EFGH");
System.out.println(result);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With