Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split 1GB Xml file using Java

Tags:

java

xml

I have a 1GB Xml file, how can I split it into well-formed, smaller size Xml files using Java ?

Here is an example:

<records>
  <record id="001">
    <name>john</name>
  </record>
 ....
</records>

Thanks.

like image 894
user534009 Avatar asked Mar 02 '11 15:03

user534009


People also ask

How do I split a large XML file?

Split large XML file in Windows (Method #1) First, click the “Add XML File(s)” button to provide the input path of the file to split, or easily drag and drop your files. Then select the tag by which the new file will be split. Next, choose after what period of tags to split into a new file.

How can I open 1 GB XML file?

An XML file is a text file and you can open it with any text editor, even one as simple as Notepad.


2 Answers

I would use a StAX parser for this situation. It will prevent the entire document from being read into memory at one time.

  1. Advance the XMLStreamReader to the local root element of the sub-fragment.
  2. You can then use the javax.xml.transform APIs to produce a new document from this XML fragment. This will advance the XMLStreamReader to the end of that fragment.
  3. Repeat step 1 for the next fragment.

Code Example

For the following XML, output each "statement" section into a file named after the "account attributes value":

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

This can be done with the following code:

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            File file = new File("out/" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

} 
like image 58
bdoughan Avatar answered Sep 19 '22 07:09

bdoughan


Try this, using Saxon-EE 9.3.

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:mode streamable="yes"/>
    <xsl:template match="record">
      <xsl:result-document href="record-{@id}.xml">
        <xsl:copy-of select="."/>
      </xsl:result-document>
    </xsl:template>
</xsl:stylesheet>

The software isn't free, but if it saves you a day's coding you can easily justify the investment. (Apologies for the sales pitch).

like image 41
Michael Kay Avatar answered Sep 22 '22 07:09

Michael Kay