We have Scenario where we need to split large xml file of size more than 10GB in small chunks. Each chunk should contain 100 or 200 element. Example xml
<Employees>
<Employee id="1">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="3">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
<Employee id="3">
<age>25</age>
<name>Meghna</name>
<gender>Female</gender>
<role>Manager</role>
</Employee>
<Employee id="3">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="3">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
</Employees>
I have Stax parser code which will split file into small chunks. But each file contains only one complete Employee element, where I need 100 or 200 or more <Employee>
elements in single file. Here is my java code
public static void main(String[] s) throws Exception{
String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+"\n";
String suffix = "\n</Employees>\n";
int count=0;
try {
int i=0;
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("D:\\Desktop\\Test\\latestxml\\test.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
File file = new File("C:\\Users\\test\\Desktop\\xml\\"+"out" +i+ ".xml");
FileOutputStream fos=new FileOutputStream(file,true);
t.transform(new StAXSource(xsr), new StreamResult(fos));
i++;
}
} catch (Exception e) {
e.printStackTrace();
}
Do not put i with every iteration, it should be update with latest count when your iteration reach to 100 or 200
Like:
String outputPath = "/test/path/foo.txt";
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
FileOutputStream file = new FileOutputStream(outputPath,true);
...
...
count ++;
if(count == 100){
i++;
outputPath = "/test/path/foo"+i+"txt";
count = 0;
}
}
i hope i get it right but you only need to increment count each time when you add one employer
File file = new File("out" + i + ".xml");
FileOutputStream fos = new FileOutputStream(file, true);
appendStuff("<Employees>",file);
while (xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
count++;
t.transform(new StAXSource(xsr), new StreamResult(fos));
if(count == 100) {
count = 0;
i++;
appendStuff("</Employees>",file);
fos.close();
file = new File("out" + i + ".xml");
fos = new FileOutputStream(file, true);
appendStuff("<Employees>",file);
}
}
Its not verly nice, but you get the idea
private static void appendStuff(String content, File file) throws IOException {
FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With