I have a 1GB Xml file, how can I split it into well-formed, smaller size Xml files using Java ?
Here is an example:
<records>
<record id="001">
<name>john</name>
</record>
....
</records>
Thanks.
Split large XML file in Windows (Method #1) First, click the “Add XML File(s)” button to provide the input path of the file to split, or easily drag and drop your files. Then select the tag by which the new file will be split. Next, choose after what period of tags to split into a new file.
An XML file is a text file and you can open it with any text editor, even one as simple as Notepad.
I would use a StAX parser for this situation. It will prevent the entire document from being read into memory at one time.
Code Example
For the following XML, output each "statement" section into a file named after the "account attributes value":
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
</statements>
This can be done with the following code:
import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
File file = new File("out/" + xsr.getAttributeValue(null, "account") + ".xml");
t.transform(new StAXSource(xsr), new StreamResult(file));
}
}
}
Try this, using Saxon-EE 9.3.
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode streamable="yes"/>
<xsl:template match="record">
<xsl:result-document href="record-{@id}.xml">
<xsl:copy-of select="."/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
The software isn't free, but if it saves you a day's coding you can easily justify the investment. (Apologies for the sales pitch).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With