Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JAXB - unmarshal OutOfMemory: Java Heap Space

Tags:

java

memory

xml

I'm currently trying to use JAXB to unmarshal an XML file, but it seems that the XML file is too large (~500mb) for the unmarshaller to handle. I keep getting java.lang.OutOfMemoryError: Java heap space @

Unmarshaller um = JAXBContext.newInstance("com.sample.xml");
Export e = (Export)um.unmarhsal(new File("SAMPLE.XML"));

I'm guessing this is becuase it's trying to open the large XML file as an object, but the file is just too large for the java heap space.

Is there any other more 'memory efficient' method of parsing large XML files ~ 500mb? Or perhaps an unmarshaller property that may help me handle the large XML file?

Here's what my XML looks like

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- -->
<Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd">
<!--- --->
<Origin ID="foooo" />
<!---- ---->
<WorkSets>
   <WorkSet>
      <Work>
         .....
      <Work>
         ....
      <Work>
      .....
   </WorkSet>
   <WorkSet>
      ....
   </WorkSet>
</WorkSets>

I'd like to unmarshal at the WorkSet level, still being able to read through all of the work for each WorkSet.

like image 714
TyC Avatar asked Nov 01 '11 15:11

TyC


4 Answers

What does your XML look like? Typically for large documents I recommend people leverage a StAX XMLStreamReader so that the document can be unmarshalled by JAXB in chunks.

input.xml

In the document below there are many instances of the person element. We can use JAXB with a StAX XMLStreamReader to unmarshal the corresponding Person objects one at a time to avoid running out of memory.

<people>
   <person>
       <name>Jane Doe</name>
       <address>
           ...
       </address>
   </person>
   <person>
       <name>John Smith</name>
       <address>
           ...
       </address>
   </person>
   ....
</people>

Demo

import java.io.*;
import javax.xml.stream.*;
import javax.xml.bind.*;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        JAXBContext jc = JAXBContext.newInstance(Person.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            Person person = (Person) unmarshaller.unmarshal(xsr);
        }
    }

}

Person

Instead of matching on the root element of the XML document we need to add @XmlRootElement annotations on the local root of the XML fragment that we will be unmarshalling from.

@XmlRootElement
public class Person {
}
like image 118
bdoughan Avatar answered Nov 20 '22 07:11

bdoughan


You could increase the heap space using the -Xmx startup argument.

For large files, SAX processing is more memory-efficient since it's event driven, and doesn't load the entire structure in to memory.

like image 5
Dave Newton Avatar answered Nov 20 '22 08:11

Dave Newton


I've been doing a lot of research in particular with regards to parsing very large input sets conveniently. It's true that you could combine StaX and JaxB to selectively parse XML fragments, but it's not always possible or preferable. If you're interested to read more on the topic please have a look at:

http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf

In this document I describe an alternative approach that is very straight forward and convenient to use. It parses arbitrarily large input sets, whilst giving you access to your data in a javabeans fashion.

like image 2
Lolke Dijkstra Avatar answered Nov 20 '22 08:11

Lolke Dijkstra


Use SAX or StAX. But if the goal is to have an in-memory object representation of the file, you'll still need lots of memory to hold the contents of such a big file. In this case, your only hope is to increase the heap size using the -Xmx1024m JVM option (which sets the max heap size to 1024 MBs)

like image 1
JB Nizet Avatar answered Nov 20 '22 06:11

JB Nizet