Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT to remove non-ASCII

Tags:

xml

xslt

xpath

I need to modify XML document with XSLT. I would like to replace all non-ASCII characters by space.

Example input:

<input>azerty12€_étè</input>

Only these characters are allowed :

!"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Expected output:

 <input>azerty12 _ t </input>
like image 804
Rene Avatar asked Jan 21 '16 19:01

Rene


People also ask

What is &# xA in XSLT?

The simplest way to do this in an XSLT stylesheet is to use the character entities for the newline ( &#xA; ) and tab ( &#x9; ) characters.

Is XSLT 2.0 backward compatibility?

The XSLT 2.0 engine is backwards compatible. The only time the backwards compatibility of the XSLT 2.0 engine comes into effect is when using the XSLT 2.0 engine to process an XSLT 1.0 stylesheet.

What is Number () in XSLT?

Specifies the format pattern. Here are some of the characters used in the formatting pattern: 0 (Digit) # (Digit, zero shows as absent)

Is XSL obsolete?

The XslTransform class is obsolete in the Microsoft . NET Framework version 2.0. The XslCompiledTransform class is the new XSLT processor.


1 Answers

Assuming you are limited to XSLT 1.0, you could try:

<xsl:variable name="ascii">!"#$%&amp;'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~</xsl:variable>
<xsl:variable name="spaces" select="'                                                                                             '" />

<xsl:template match="input">
    <xsl:copy>
        <xsl:value-of select="translate(., translate(., $ascii, ''), $spaces)"/>
    </xsl:copy>
</xsl:template>

This is a bit of a hack: it will work for as long as there are enough spaces in the $spaces variable to accommodate all the non-ascii characters found in the input.

If you don't want to rely on such assumption, you will have to use a recursive template to replace them one-by-one:

<xsl:template match="input">
    <xsl:copy>
        <xsl:call-template name="replace-non-ascii">
            <xsl:with-param name="text" select="."/>
        </xsl:call-template>
    </xsl:copy>
</xsl:template>

<xsl:template name="replace-non-ascii">
    <xsl:param name="text"/>
    <xsl:variable name="ascii"> !"#$%&amp;'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~</xsl:variable>
    <xsl:variable name="non-ascii" select="translate($text, $ascii, '')" />
    <xsl:choose>
        <xsl:when test="$non-ascii">
            <xsl:variable name="char" select="substring($non-ascii, 1, 1)" />
            <!-- recursive call -->
            <xsl:call-template name="replace-non-ascii">
                <xsl:with-param name="text" select="translate($text, $char, ' ')"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>
like image 153
michael.hor257k Avatar answered Oct 13 '22 00:10

michael.hor257k