I want to transform XML files using XSLT2, in a huge directory with a lot of levels. There are more than 1 million files, each file is 4 to 10 kB. After a while I always receive java.lang.OutOfMemoryError: Java heap space.
My command is: java -Xmx3072M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEna bled -XX:MaxPermSize=512M ...
Add more memory to -Xmx is not a good solution.
Here are my codes:
for (File file : dir.listFiles()) {
if (file.isDirectory()) {
pushDocuments(file);
} else {
indexFiles.index(file);
}
}
public void index(File file) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
xslTransformer.xslTransform(outputStream, file);
outputStream.flush();
outputStream.close();
} catch (IOException e) {
System.err.println(e.toString());
}
}
XSLT transform by net.sf.saxon.s9api
public void xslTransform(ByteArrayOutputStream outputStream, File xmlFile) {
try {
XdmNode source = proc.newDocumentBuilder().build(new StreamSource(xmlFile));
Serializer out = proc.newSerializer();
out.setOutputStream(outputStream);
transformer.setInitialContextNode(source);
transformer.setDestination(out);
transformer.transform();
out.close();
} catch (SaxonApiException e) {
System.err.println(e.toString());
}
}
Prevention: If MaxMetaSpaceSize, has been set on the command line, increase its value. MetaSpace is allocated from the same address spaces as the Java heap. Reducing the size of the Java heap will make more space available for MetaSpace.
lang. OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.
The execution thread that is responsible to clear the heap space is the Garbage Collector. The task of the Garbage Collector is to find all objects that are not referenced at all and reclaim their space. Usually, a Garbage Collector is being executed periodically by the JVM, in order for new space to be created.
My usual recommendation with the Saxon s9api interface is to reuse the XsltExecutable object, but to create a new XsltTransformer for each transformation. The XsltTransformer caches documents you have read in case they are needed again, which is not what you want in this case.
As an alternative, you could call xsltTransformer.getUnderlyingController().clearDocumentPool()
after each transformation.
(Please note, you can ask Saxon questions at saxonica.plan.io, which gives a good chance we [Saxonica] will notice them and answer them. You can also ask them here and tag them "saxon", which means we'll probably respond to the question at some point, though not always immediately. If you ask on StackOverflow with no product-specific tags, it's entirely hit-and-miss whether anyone will notice the question.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With