Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xalan XSLT - Out of Memory Heap Space

My project has a reporting module that gathers data from the database in the form of XML and runs an XSLT on it to generate the user's desired format of report. Options at this point are HTML and CSV.

We use Java and Xalan to do all interaction with the data.

The bad part is that one of these reports that the user can request is 143MB (about 430,000 records) for just the XML portion. When this is transformed into HTML, I run out of heap space with a maximum of 4096G reserved for heap. This is unacceptable.

It seems that the problem is simply too much data, but I can't help but think there is a better way to deal with this than limiting the customer and not being able to meet functional requirements.

I am glad to give more information as needed, but I cannot disclose too much about the project as I'm sure most of you understand. Also, the answer is yes; I need all of the data at the same time: I cannot paginate it.

Thanks

EDIT

All the transformation classes I am using are in the javax.xml.transform package. The implementation looks like this:

final Transformer transformer = 
  TransformerFactory.newInstance().newTransformer(
    new StreamSource(new StringReader(xsl)));
final StringWriter outWriter = new StringWriter();
transformer.transform(
  new StreamSource(new StringReader(xml)), new StreamResult(outWriter));
return outWriter.toString();

If possible, I would like to leave the XSLT the way it is. The StreamSource method of doing things should allow me to GC some of the data as it is processed, but I'm not sure what limitations on XSLT (functions, etc) this might require for it to do proper cleanup. If someone could point me at a resource detailing those limitations, it would be very helpful.

like image 613
Andy Avatar asked Jan 30 '12 22:01

Andy


2 Answers

The problem with XSLT is that you need to have a DOM representation of the whole source document (as well as the result document) in memory while doing the transformation. For large XML files this is a serious problem.

You are interested in a system that allows a streaming transformation where the full documents do not have to recide in memory. Maybe STX is an option: http://www.xml.com/pub/a/2003/02/26/stx.html http://stx.sourceforge.net/. It is quite similar to XSLT, so if your XSLT stylesheet is applied to the XML in a straight-forward manner, rewriting it to STX could be quite simple.

like image 82
Mathias Schwarz Avatar answered Sep 22 '22 23:09

Mathias Schwarz


We are able to improve this by doing two things.

  1. We take the XML source and destination format and make them files in temp. This keeps the initial creation and storage out of RAM, since the data is coming from a database and being written back to the DB as well. A handle to the data is all that's necessary.

  2. Use the Saxonica transformer. This allows for a couple things including SAX-style transformations and the use of XSLT 2.0, which the Xalan parser does not.

like image 24
Andy Avatar answered Sep 24 '22 23:09

Andy