Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT How to trim space before and after an element, when the element says to?

Tags:

xslt

xpath

This problem emerges in formatting text documents that use TEI markup (www.tei-c.org). It is beyond my XSLT/XPATH skills. (A solution in XSLT/XPATH 1.0 is required.)

There is a mark-up element, <lb>, that marks line breaks. It can take an attribute @break. If @break="no", then any space between the <lb> and surrounding text should be ignored when generating output.

So

This little tea <lb break="no" />
pot, short and stout.

should be understood as

This little teapot, short and stout.

That is, the space after "tea" and the newline before "pot" should not be rendering in the output stream.

For the space before the <lb>, this could work:

<xsl:template match="text()[following-sibling::*[1][self::lb[@break='no']]">
    <!-- Do something about the space here. -->
</xsl:template> 

Something similar would work for the newline after the <lb>.

OK. But this is trickier:

This <emph>little <ref>tea </ref> </emph>
<lb break="no" />
pot, short and stout.

Now the text inside the <ref> element is not a sibling of <lb>. And the space before </ref>, the space before </emph> and the newlines before and after the <lb> all need to be excised from the output stream.

How?

like image 996
JPM Avatar asked Oct 09 '22 03:10

JPM


1 Answers

Here's a tested, working implementation, including how to trim whitespace from the right or left side of the text node:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>

    <!-- Match if the preceding node (not necessarily sibling) that is either
      a non-empty-space-text node or an <lb> is an <lb break='no'> -->
    <xsl:template match="text()[
        (preceding::node()[
            self::text()[normalize-space() != ''] or
            self::lb])
                [last()]
        [self::lb[@break='no']]
        ]">

        <!-- Trim whitespace on the left. Thanks to Alejandro,
            http://stackoverflow.com/a/3997107/423105 -->
        <xsl:variable name="firstNonSpace"
            select="substring(normalize-space(), 1, 1)"/>
        <xsl:value-of select="concat($firstNonSpace,
            substring-after(., $firstNonSpace))"/>
    </xsl:template>

    <!-- Match if the next node (not necessarily sibling) that is either
      a non-empty-space-text node or an <lb> is an <lb break='no'> -->
    <xsl:template match="text()[
        following::node()[
            self::text()[normalize-space() != ''] or
            self::lb]
               [1]
        [self::lb[@break='no']]
        ]">

        <xsl:variable name="normalized" select="normalize-space()"/>
        <xsl:if test="$normalized != ''">
            <xsl:variable name="lastNonSpace"
                select="substring($normalized, string-length($normalized))"/>
            <xsl:variable name="trimmedSuffix">
                <xsl:call-template name="substring-after-last">
                    <xsl:with-param name="string" select="."/>
                    <xsl:with-param name="delimiter" select="$lastNonSpace"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of select="substring(., 1, string-length(.) -
               string-length($trimmedSuffix))"/>
        </xsl:if>
        <!-- otherwise output nothing. -->
    </xsl:template>


    <!-- Thanks to Jeni Tennison:
        http://www.stylusstudio.com/xsllist/200111/post00460.html -->
    <xsl:template name="substring-after-last">
        <xsl:param name="string" />
        <xsl:param name="delimiter" />
        <xsl:choose>
            <xsl:when test="contains($string, $delimiter)">
                <xsl:call-template name="substring-after-last">
                    <xsl:with-param name="string"
                        select="substring-after($string, $delimiter)" />
                    <xsl:with-param name="delimiter" select="$delimiter" />
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise><xsl:value-of select="$string" /></xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

My assumption here, pending the answer to my "Next ambiguity" comment above, is that if there is an <lb> element without break="no", that <lb> constitutes "surrounding text" in the sense that it serves as a boundary for ignoring whitespace.

Sample input:

<test>
    <t1>
        This <emph>little <ref>tea </ref> </emph>
        <lb break="no" />
        pot, short and stout.        
    </t1>    
    <t2>
        This <emph>little <ref>tea </ref> </emph>
        <lb />
        <lb break="no" />
        pot, short and stout.        
    </t2>    
</test>

Output:

<test>
    <t1>
        This <emph>little <ref>tea</ref></emph><lb break="no"/>pot, short and stout.        
    </t1>    
    <t2>
        This <emph>little <ref>tea </ref> </emph>
        <lb/><lb break="no"/>pot, short and stout.        
    </t2>    
</test>

This output is correct AFAICT. If not, please let me know why and I'll see about fixing it.

like image 186
LarsH Avatar answered Oct 12 '22 10:10

LarsH