Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT Transform Efficiency

Tags:

.net

xslt

I am a support engineer and our company's product allows XSLT transforms to customize outputs.

I made a xsl transform for this purpose. It works well for source files of typical size (several 100k), but occasionally a really huge (10M) source file will come by. In such case, the output is not generated even if I let it grind several days.

The SW engineering team tested it and discovered that for the transform and large source file in question is indeed very slow (>days), if our product is compiled to use the transform engine in .Net 1.1, but if they compile it with .Net 2.0, it is plenty fast (about 1-2 minutes).

The long term solution obviously is, wait for the next release.

For the short term I am wondering the following: 1) Is XSLT flexible enough that there are more efficient and less efficient ways to acheive the same result? For example, is it possible that the way I structured the xsl, the transform engine has to iterate from the beginning of the source file many many times, taking longer and longer as the next result piece gets farther and farther from the beginning? (Schlemiel the Painter), or 2) Is it more dependent on how the transform engine interprets the xsl?

If 2 is the case, I don't want to waste a lot of time trying to improve the xsl (I am not a big xsl genius, it was hard enough for me to achieve what little I did...).

Thanks!

like image 422
KnomDeGuerre Avatar asked Oct 22 '08 19:10

KnomDeGuerre


People also ask

Is XSLT slow?

XSLT transformations cause high CPU and slow performance.

Is there any benefit of converting XML to XSLT?

XSLT is commonly used to convert XML to HTML, but can also be used to transform XML documents that comply with one XML schema into documents that comply with another schema. XSLT can also be used to convert XML data into unrelated formats, like comma-delimited text or formatting languages such as troff.

Is XSLT still relevant?

As of August 2022, the most recent stable version of the language is XSLT 3.0, which achieved Recommendation status in June 2017. XSLT 3.0 implementations support Java, . NET, C/C++, Python, PHP and NodeJS. An XSLT 3.0 Javascript library can also be hosted within the Web Browser.

What are the advantages of XSLT?

XSLT has several important advantages over all other programming languages when it comes to XML data transformation: It has been designed to work with enterprise XML technologies. It makes most data conversion tasks very easy to understand and implement. Solutions written using XSLT can be proven to be correct.


3 Answers

I'm not familiar with the .NET implementations, but there are a few things you can do in general to speed up processing of large documents:

  • Avoid using "//" in Xpath expressions unless absolutely necessary.
  • If you only need the first or only element that matches an Xpath expression, use the "[1]" qualifier, e.g. "//iframe[1]". Many processors implement optimizations for this.
  • Whenever possible, when dealing with huge XML input, see if you can design a solution around a stream-based parser (like SAX) instead of a DOM-based parser.
like image 69
Marco Avatar answered Oct 15 '22 04:10

Marco


Normally, if you see a non-linear increase in processing time vs. input size, you should suspect your code more than the framework. But since the problem goes away when the tool is compiled with .NET 2.0, all bets are off.

With XSLT, it's hard to create a non-linear performance curve if you do all your parsing with straight template matches:

<xsl:template match="foo">
  <!--OUTPUT-->
  <xsl:apply-templates / >
  <!--OUTPUT-->
</xsl:template>

 <xsl:template match="bar">
  <!--OUTPUT-->
  <xsl:apply-templates / >
  <!--OUTPUT-->
</xsl:template>

Pay careful attention to anywhere you might have resorted to <xsl:for-each> for parsing; template matches are virtually always a better way to achieve the same result.

One way to troubleshoot this performance problem is to recreate your XSLT one template-match at a time, testing the processing time after adding each match. You might start with this match:

<xsl:template match="*">
  <xsl:copy>                   <!--Copy node                   -->
    <xsl:copy-of select="@*"/> <!--Copy node attributes         -->
    <xsl:apply-templates />    <!--Process children             -->
  </xsl:copy>
</xsl:template>

This will match and copy every node, one at a time, to a new document. This should not exhibit a non-linear increase in processing time vs. input size (if it does, then the problem is not with your XSLT code).

As you recreate your XSLT, if you add a template-match that suddenly kills performance, comment out every block inside the template. Then, uncomment one block at a time, testing the processing time each iteration, until you find the block that causes the problem.

like image 25
trebormf Avatar answered Oct 15 '22 04:10

trebormf


To detect when to start a new section, I did this:

<xsl:if test="@TheFirstCol>preceding-sibling::*[1]/@TheFirstCol"

Could this be causing a lot or re-iteration?

Definitely. The algorithm you've chosen is O(N2) and would be very slow with sufficient number of siblings, regardless of the implementation language.

Here is an efficient algorithm using keys:

Solution1:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output method="text"/>

 <xsl:key name="kC1Value" match="@c1" use="."/>

    <xsl:template match="/">
      <xsl:for-each select="*/x[generate-id(@c1) = generate-id(key('kC1Value',@c1)[1])]">

       <xsl:value-of select="concat('&#xA;',@c1)"/>

       <xsl:for-each select="key('kC1Value',@c1)">
         <xsl:value-of select="'&#xA;'"/>
         <xsl:for-each select="../@*[not(name()='c1')]">
           <xsl:value-of select="concat('   ', .)"/>
         </xsl:for-each>
       </xsl:for-each>
      </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Unfortunately, XslTransform (.Net 1.1) has a notoriously inefficient implementation of the generate-id() function.

The following may be faster with XslTransform:

Solution2:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output method="text"/>

 <xsl:key name="kC1Value" match="@c1" use="."/>

    <xsl:template match="/">
      <xsl:for-each select="*/x[count(@c1 | key('kC1Value',@c1)[1]) = 1]">

       <xsl:value-of select="concat('&#xA;',@c1)"/>

       <xsl:for-each select="key('kC1Value',@c1)">
         <xsl:value-of select="'&#xA;'"/>
         <xsl:for-each select="../@*[not(name()='c1')]">
           <xsl:value-of select="concat('   ', .)"/>
         </xsl:for-each>
       </xsl:for-each>
      </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

When applied on the following small XML document:

<t>
 <x c1="1" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="1" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="1" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="1" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="2" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="2" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="2" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="2" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="3" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="3" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="4" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="4" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="4" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="4" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="5" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="5" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="5" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="6" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="6" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="6" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="6" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="6" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="6" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="7" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="7" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="7" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="7" c2="1" c3="1" c4="0" c5="0"/>
 <x c1="8" c2="0" c3="0" c4="0" c5="0"/>
 <x c1="8" c2="0" c3="1" c4="0" c5="0"/>
 <x c1="8" c2="2" c3="0" c4="0" c5="0"/>
 <x c1="8" c2="1" c3="1" c4="0" c5="0"/>
</t>

both solutions produced the wanted result:

1
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
2
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
3
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
4
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
5
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
   0   0   0   0
   0   1   0   0
6
   2   0   0   0
   1   1   0   0
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
7
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0
8
   0   0   0   0
   0   1   0   0
   2   0   0   0
   1   1   0   0

From the above small XML file I generated a 10MB XML file by copying every element 6250 times (using another XSLT transformation :) ).

With the 10MB xml file and with XslCompiledTransform (.Net 2.0 + ) the two solutions had the following transformation times:

Solution1: 3.3sec.
Solution2: 2.8sec.

With XslTransform (.Net 1.1) Solution2 ran for 1622sec.; that is about 27 minutes.

like image 22
Dimitre Novatchev Avatar answered Oct 15 '22 02:10

Dimitre Novatchev