I'm currently trying to use JAXB to unmarshal an XML file, but it seems that the XML file is too large (~500mb) for the unmarshaller to handle. I keep getting java.lang.OutOfMemoryError: Java heap space
@
Unmarshaller um = JAXBContext.newInstance("com.sample.xml");
Export e = (Export)um.unmarhsal(new File("SAMPLE.XML"));
I'm guessing this is becuase it's trying to open the large XML file as an object, but the file is just too large for the java heap space.
Is there any other more 'memory efficient' method of parsing large XML files ~ 500mb? Or perhaps an unmarshaller property that may help me handle the large XML file?
Here's what my XML looks like
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- -->
<Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd">
<!--- --->
<Origin ID="foooo" />
<!---- ---->
<WorkSets>
<WorkSet>
<Work>
.....
<Work>
....
<Work>
.....
</WorkSet>
<WorkSet>
....
</WorkSet>
</WorkSets>
I'd like to unmarshal at the WorkSet level, still being able to read through all of the work for each WorkSet.
What does your XML look like? Typically for large documents I recommend people leverage a StAX XMLStreamReader so that the document can be unmarshalled by JAXB in chunks.
input.xml
In the document below there are many instances of the person
element. We can use JAXB with a StAX XMLStreamReader
to unmarshal the corresponding Person
objects one at a time to avoid running out of memory.
<people>
<person>
<name>Jane Doe</name>
<address>
...
</address>
</person>
<person>
<name>John Smith</name>
<address>
...
</address>
</person>
....
</people>
Demo
import java.io.*;
import javax.xml.stream.*;
import javax.xml.bind.*;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
JAXBContext jc = JAXBContext.newInstance(Person.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
Person person = (Person) unmarshaller.unmarshal(xsr);
}
}
}
Person
Instead of matching on the root element of the XML document we need to add @XmlRootElement
annotations on the local root of the XML fragment that we will be unmarshalling from.
@XmlRootElement
public class Person {
}
You could increase the heap space using the -Xmx
startup argument.
For large files, SAX processing is more memory-efficient since it's event driven, and doesn't load the entire structure in to memory.
I've been doing a lot of research in particular with regards to parsing very large input sets conveniently. It's true that you could combine StaX and JaxB to selectively parse XML fragments, but it's not always possible or preferable. If you're interested to read more on the topic please have a look at:
http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf
In this document I describe an alternative approach that is very straight forward and convenient to use. It parses arbitrarily large input sets, whilst giving you access to your data in a javabeans fashion.
Use SAX or StAX. But if the goal is to have an in-memory object representation of the file, you'll still need lots of memory to hold the contents of such a big file. In this case, your only hope is to increase the heap size using the -Xmx1024m
JVM option (which sets the max heap size to 1024 MBs)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With