Transforming node contents to remove whitespace

Question

If the contents of a citations node is something like the following:

                <p>

            WAJWAJADS:

            </p>

<p>

            asdf

            </p>

<p>

            ALSOAS:

            </p>

<p>

            lorem ipsum...<br />
lorem<br />
blah blah <i>

            adfas &amp; dasdsaafs

            </i>, April 2011.<br />
lorem lorem dear lord the whitespace

            </p>

Is there any way to transform this to properly formatted HTML with XSLT?

normalize-space() just concats everything together. The best I've managed to do is normalize-space() on all p descendants within a for-each loop and wrap them in a p element. However, then any inner tags are still lost.

Is there a better way to parse this WYSIWYG generated trainwreck? Unfortunately I have no control over the generated XML.

Joel M. Lamsen · Accepted Answer

I've modified a little the answer by Martin Honnen:

<xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)"/>
    <xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != '  '">
        <xsl:text> </xsl:text>
    </xsl:if>
</xsl:template>

it tests if the last character is a space and the last 2 characters are not both spaces, if true, it inserts a space.

Transforming node contents to remove whitespace

Tags:

xslt

xslt-1.0

tenub

1 Answers

Joel M. Lamsen

Recent Activity

Donate For Us

Transforming node contents to remove whitespace

Tags:

xslt

xslt-1.0

tenub

1 Answers

Joel M. Lamsen

Related questions

Recent Activity

Donate For Us