Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.lang.OutOfMemoryError while transforming XML in a huge directory

I want to transform XML files using XSLT2, in a huge directory with a lot of levels. There are more than 1 million files, each file is 4 to 10 kB. After a while I always receive java.lang.OutOfMemoryError: Java heap space.

My command is: java -Xmx3072M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEna bled -XX:MaxPermSize=512M ...

Add more memory to -Xmx is not a good solution.

Here are my codes:

for (File file : dir.listFiles()) {
    if (file.isDirectory()) {
        pushDocuments(file);
    } else {
        indexFiles.index(file);
    }
}

public void index(File file) {
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    try {
        xslTransformer.xslTransform(outputStream, file);
        outputStream.flush();
        outputStream.close();
    } catch (IOException e) {
        System.err.println(e.toString());
    }
}

XSLT transform by net.sf.saxon.s9api

public void xslTransform(ByteArrayOutputStream outputStream, File xmlFile) {
    try {
        XdmNode source = proc.newDocumentBuilder().build(new StreamSource(xmlFile));
        Serializer out = proc.newSerializer();
        out.setOutputStream(outputStream);
        transformer.setInitialContextNode(source);
        transformer.setDestination(out);
        transformer.transform();

        out.close();
    } catch (SaxonApiException e) {
        System.err.println(e.toString());
    }
}
like image 582
wonder garance Avatar asked Nov 04 '13 08:11

wonder garance


People also ask

How can we avoid OutOfMemoryError in Java?

Prevention: If MaxMetaSpaceSize, has been set on the command line, increase its value. MetaSpace is allocated from the same address spaces as the Java heap. Reducing the size of the Java heap will make more space available for MetaSpace.

What causes Java Lang OutOfMemoryError?

lang. OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.

How do I release Java heap space?

The execution thread that is responsible to clear the heap space is the Garbage Collector. The task of the Garbage Collector is to find all objects that are not referenced at all and reclaim their space. Usually, a Garbage Collector is being executed periodically by the JVM, in order for new space to be created.


1 Answers

My usual recommendation with the Saxon s9api interface is to reuse the XsltExecutable object, but to create a new XsltTransformer for each transformation. The XsltTransformer caches documents you have read in case they are needed again, which is not what you want in this case.

As an alternative, you could call xsltTransformer.getUnderlyingController().clearDocumentPool() after each transformation.

(Please note, you can ask Saxon questions at saxonica.plan.io, which gives a good chance we [Saxonica] will notice them and answer them. You can also ask them here and tag them "saxon", which means we'll probably respond to the question at some point, though not always immediately. If you ask on StackOverflow with no product-specific tags, it's entirely hit-and-miss whether anyone will notice the question.)

like image 167
Michael Kay Avatar answered Oct 13 '22 08:10

Michael Kay