Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT transformation on Large XML files with C#

Tags:

c#

xml

xslt

I have some very large XML files (800 MB to 1.5 GB). I need to apply XSLT on that. I am able to read it XMLTextReader. When i applied XSLT transformation, get SystemOutOfMemory Exception.

My code looks like;

static void Main(string[] args)
{
    XDocument newTree = new XDocument();
    XmlTextReader oReader = new XmlTextReader(@"C:\Projects\myxml.xml");


    using (XmlWriter writer = newTree.CreateWriter())
    {
        XslCompiledTransform oTransform = new XslCompiledTransform();
        oTransform.Load(@"C:\Projects\myXSLT.xsl");
        oTransform.Transform(oReader, writer);
    }
    Console.WriteLine(newTree);
}

Thanks in advance. It is very urgent. If I don't get any solution, I need to split XML into smaller XML and do transformation.

like image 307
jvm Avatar asked Jun 23 '10 11:06

jvm


People also ask

Does anyone use XSLT anymore?

XSLT is very widely used. As far as we can judge from metrics like the number of StackOverflow questions, it is in the top 30 programming languages, which probably makes it the top data-model-specific programming language after SQL. But XSLT isn't widely used client-side, that is, in the browser.

How big can an XML file be?

Even though the maximum file size is set to 100 MB, it is still possible to import an XML file larger than 100 MB via P6 Professional. The issue can be reproduced at will with the following steps: 1. In P6 Admin, set the Services --> Import / Export Options --> Maximum file size to 102 000 (102 MB).

Is there any benefit of converting XML to XSLT?

XSLT is commonly used to convert XML to HTML, but can also be used to transform XML documents that comply with one XML schema into documents that comply with another schema. XSLT can also be used to convert XML data into unrelated formats, like comma-delimited text or formatting languages such as troff.

Can XSLT transform XML to CSV?

This post shows you how to convert a simple XML file to CSV using XSLT. The following XSL Style Sheet (compatible with XSLT 1.0) can be used to transform the XML into CSV. It is quite generic and can easily be configured to handle different xml elements by changing the list of fields defined ar the beginning.


2 Answers

XSLT uses XPath and this requires that the whole XML document be maintained in memory. Thus the problem of insufficient memory is by definition.

There are simle rules to approximate how much memory is needed and one of them says 5 * text-size.

So, for a "typical 1.5GB XML file" 8GB RAM may be sufficient.

Either split the document into smaller parts or wait for an implementation of XSLT 2.1, which defines special streaming instructions. In the meantime one may use the latest (commercial) version of Saxon, which implements extensions for streaming and successful processing of 64GB document has been reported on twitter.

like image 85
Dimitre Novatchev Avatar answered Sep 20 '22 11:09

Dimitre Novatchev


we are facing a similar problem. The solution we came uo with was to not use xslt for this case, and instead use Linq to Xml transformations while stteaming the data. You can leverage the c# yield keyword to iterate through an xml stream and tackle the file piecemeal this way. See streaming with linq to xml

the nature of xslt requires the xml to be loaded into memory. what needs to occur is you need to break down the large file into more managable pieces. if you use the xml streaming technique, you can break the document up into sub elements which you can then individually apply the xslt to. you may have to rewrite the xslt to accomodate this behavior.

Aside from this, the only other option is to throw more hardware at it, but this might even require an operating system upgrade depending on RAM limitations...

like image 34
E Rolnicki Avatar answered Sep 19 '22 11:09

E Rolnicki