Loading local chunks in DOM while parsing a large XML file in SAX (Java)

Tags:

I've an xml file that I would avoid having to load all in memory. As everyone know, for such a file I better have to use a SAX parser (which will go along the file and call for events if something relevant is found.)

My current problem is that I would like to process the file "by chunk" which means:

Parse the file and find a relevant tag (node)
Load this tag entirely in memory (like we would do it in DOM)
Do the process of this entity (that local chunk)
When I'm done with the chunk, release it and continue to 1. (until "end of file")

In a perfect world I'm searching some something like this:

Click to copy

// 1. Create a parser and set the file to load
      IdealParser p = new IdealParser("BigFile.xml");
// 2. Set an XPath to define the interesting nodes
      p.setRelevantNodesPath("/path/to/relevant/nodes");
// 3. Add a handler to callback the right method once a node is found
      p.setHandler(new Handler(){
// 4. The method callback by the parser when a relevant node is found
      void aNodeIsFound(saxNode aNode)
   {
   // 5. Inflate the current node i.e. load it (and all its content) in memory
         DomNode d = aNode.expand();
   // 6. Do something with the inflated node (method to be defined somewhere)
         doThingWithNode(d);
    }
   });
// 7. Start the parser
      p.start();

I'm currently stuck on how to expand a "sax node" (understand me…) efficiently.

Is there any Java framework or library relevant to this kind of task?

333

asked Nov 03 '11 16:11

Flavien Volken

1 Answers

UPDATE

You could also just use the javax.xml.xpath APIs:

Click to copy

package forum7998733;

import java.io.FileReader;
import javax.xml.xpath.*;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;

public class XPathDemo {

    public static void main(String[] args) throws Exception {
        XPathFactory xpf = XPathFactory.newInstance();
        XPath xpath = xpf.newXPath();
        InputSource xml = new InputSource(new FileReader("BigFile.xml"));
        Node result = (Node) xpath.evaluate("/path/to/relevant/nodes", xml, XPathConstants.NODE);
        System.out.println(result);
    }

}

Below is a sample of how it could be done with StAX.

input.xml

Below is some sample XML:

Click to copy

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

Demo

In this example a StAX XMLStreamReader is used to find the node that will be converted to a DOM. In this example we convert each statement fragment to a DOM, but your navigation algorithm could be more advanced.

Click to copy

package forum7998733;

import java.io.FileReader;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.dom.*;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("src/forum7998733/input.xml"));
        xsr.nextTag(); // Advance to statements element

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            DOMResult domResult = new DOMResult();
            t.transform(new StAXSource(xsr), domResult);

            DOMSource domSource = new DOMSource(domResult.getNode());
            StreamResult streamResult = new StreamResult(System.out);
            t.transform(domSource, streamResult);
        }
    }

}

Output

Click to copy

<?xml version="1.0" encoding="UTF-8" standalone="no"?><statement account="123">
      ...stuff...
   </statement><?xml version="1.0" encoding="UTF-8" standalone="no"?><statement account="456">
      ...stuff...
   </statement>

156

answered Sep 19 '22 01:09

bdoughan

Related questions
                            
                                DateFormat is printing new Date(0) as epoch + 1 hour
                            
                                Making file transfer more efficient Java
                            
                                Java Security Testing
                            
                                Java HashMap not finding key, but it should
                            
                                Get the first letter of each word in a string using regex
                            
                                Android proguard obfuscated code is causing NullPointerException when it really shouldn't be
                            
                                How to Inject a Bean with EJB 3.1 before the class constructor runs?
                            
                                JPA/Hibernate preUpdate doesn't update parent object
                            
                                Java Runtime.exec() asynchronous output
                            
                                Grid Size in Spring batch
                            
                                Javamail and adding link to text
                            
                                How to add text area on JOptionPane
                            
                                Method to limit potential values of an Enum
                            
                                javac show error without warnings
                            
                                What's the proper way of declaring project constants in Java?
                            
                                tomcat filter for all webapps
                            
                                Any Equivalent for mcrypt (in PHP) to use in Java?
                            
                                Setting images to Clipboard - Java
                            
                                How do I get trading holidays from the Bloomberg API
                            
                                How increase number of files shown in Recently opened files list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loading local chunks in DOM while parsing a large XML file in SAX (Java)

Tags:

java

dom

xml

xpath

sax

Flavien Volken

People also ask

1 Answers

bdoughan

Recent Activity

Donate For Us