Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I speed up my 'divide and conquer' XSLT template which replaces certain characters in a string?

UPDATE: I added an answer to this question which incorporates almost all the suggestions which have been given. The original template given in the code below needed 45605ms to finish a real world input document (english text about script programming). The revised template in the community wiki answer brought the runtime down to 605ms!

I'm using the following XSLT template for replacing a few special characters in a string with their escaped variants; it calls itself recursively using a divide-and-conquer strategy, eventually looking at every single character in a given string. It then decides whether the character should be printed as it is, or whether any form of escaping is necessary:

<xsl:template name="escape-text">
<xsl:param name="s" select="."/>
<xsl:param name="len" select="string-length($s)"/>
<xsl:choose>
    <xsl:when test="$len >= 2">
        <xsl:variable name="halflen" select="round($len div 2)"/>
        <xsl:variable name="left">
            <xsl:call-template name="escape-text">
                <xsl:with-param name="s" select="substring($s, 1, $halflen)"/>
                <xsl:with-param name="len" select="$halflen"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="right">
            <xsl:call-template name="escape-text">
                <xsl:with-param name="s" select="substring($s, $halflen + 1)"/>
                <xsl:with-param name="len" select="$halflen"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:value-of select="concat($left, $right)"/>
    </xsl:when>
    <xsl:otherwise>
        <xsl:choose>
            <xsl:when test="$s = '&quot;'">
                <xsl:text>&quot;\&quot;&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '@'">
                <xsl:text>&quot;@&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '|'">
                <xsl:text>&quot;|&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '#'">
                <xsl:text>&quot;#&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '\'">
                <xsl:text>&quot;\\&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '}'">
                <xsl:text>&quot;}&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '&amp;'">
                <xsl:text>&quot;&amp;&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '^'">
                <xsl:text>&quot;^&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '~'">
                <xsl:text>&quot;~&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '/'">
                <xsl:text>&quot;/&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '{'">
                <xsl:text>&quot;{&quot;</xsl:text>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$s"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:otherwise>
</xsl:choose>
</xsl:template>

This template accounts for the majority of runtime which my XSLT script needs. Replacing the above escape-text template with just

<xsl:template name="escape-text">
    <xsl:param name="s" select="."/>
    <xsl:value-of select="$s"/>
</xsl:template>

makes the runtime of my XSLT script go from 45 seconds to less than one seconds on one of my documents.

Hence my question: how can I speed up my escape-text template? I'm using xsltproc and I'd prefer a pure XSLT 1.0 solution. XSLT 2.0 solutions would be welcome too. However, external libraries might not be useful for this project - I'd still be interested in any solutions using them though.

like image 958
Frerich Raabe Avatar asked Aug 24 '10 15:08

Frerich Raabe


People also ask

How do I replace text in XSLT?

With the help of the XSLT2. 0 replace () function removes undesired characters from a string in powerful regular expressions. XSLT replaces corresponding single characters, not the entire string. The dollar sign ($) interprets the rightmost characters in the string as literal.

How do I remove special characters from a string in XSLT?

The inner translate( ) removes all characters of interest (e.g., numbers) to obtain a from string for the outer translate( ) , which removes these non-numeric characters from the original string.

What is string length in XSLT?

string-length() Function — Returns the number of characters in the string passed in as the argument to this function. If no argument is specified, the context node is converted to a string and the length of that string is returned.

How do you substring values in XSLT?

Substring is primarily used to return a section of a string or truncating a string with the help of the length provided. XSLT is incorporated with XPath while manipulating strings of text when XSLT access. The Substring function takes a string as an argument with the numbers one or two and an optional length.


2 Answers

Another (complementary) strategy would be to terminate the recursion early, before the string length is down to 1, if the condition translate($s, $vChars, '') = $s is true. This should give much faster processing of strings that contain no special characters at all, which is probably the majority of them. Of course the results will depend on how efficient xsltproc's implementation of translate() is.

like image 87
Michael Kay Avatar answered Oct 21 '22 21:10

Michael Kay


A very small correction improved the speed in my tests about 17 times.

There are additional improvements, but I guess this will suffice for now ... :)

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:my="my:my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vChars">"@|#\}&amp;^~/{</xsl:variable>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()" name="escape-text">
  <xsl:param name="s" select="."/>
  <xsl:param name="len" select="string-length($s)"/>

  <xsl:choose>
    <xsl:when test="$len >= 2">
        <xsl:variable name="halflen" select="round($len div 2)"/>
        <xsl:variable name="left">
            <xsl:call-template name="escape-text">
                <xsl:with-param name="s" select="substring($s, 1, $halflen)"/>
                <xsl:with-param name="len" select="$halflen"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="right">
            <xsl:call-template name="escape-text">
                <xsl:with-param name="s" select="substring($s, $halflen + 1)"/>
                <xsl:with-param name="len" select="$halflen"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:value-of select="concat($left, $right)"/>
    </xsl:when>
    <xsl:otherwise>
        <xsl:choose>
            <xsl:when test="not(contains($vChars, $s))">
             <xsl:value-of select="$s"/>
            </xsl:when>
            <xsl:when test="$s = '&quot;'">
                <xsl:text>&quot;\&quot;&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '@'">
                <xsl:text>&quot;@&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '|'">
                <xsl:text>&quot;|&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '#'">
                <xsl:text>&quot;#&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '\'">
                <xsl:text>&quot;\\&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '}'">
                <xsl:text>&quot;}&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '&amp;'">
                <xsl:text>&quot;&amp;&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '^'">
                <xsl:text>&quot;^&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '~'">
                <xsl:text>&quot;~&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '/'">
                <xsl:text>&quot;/&quot;</xsl:text>
            </xsl:when>
            <xsl:when test="$s = '{'">
                <xsl:text>&quot;{&quot;</xsl:text>
            </xsl:when>
        </xsl:choose>
    </xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
like image 37
Dimitre Novatchev Avatar answered Oct 21 '22 21:10

Dimitre Novatchev