Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming node contents to remove whitespace

Tags:

xslt

xslt-1.0

If the contents of a citations node is something like the following:

                <p>

            WAJWAJADS:

            </p>

<p>

            asdf

            </p>

<p>

            ALSOAS:

            </p>

<p>

            lorem ipsum...<br />
lorem<br />
blah blah <i>

            adfas &amp; dasdsaafs

            </i>, April 2011.<br />
lorem lorem dear lord the whitespace

            </p>

Is there any way to transform this to properly formatted HTML with XSLT?

normalize-space() just concats everything together. The best I've managed to do is normalize-space() on all p descendants within a for-each loop and wrap them in a p element. However, then any inner tags are still lost.

Is there a better way to parse this WYSIWYG generated trainwreck? Unfortunately I have no control over the generated XML.

like image 816
tenub Avatar asked Dec 21 '25 01:12

tenub


1 Answers

I've modified a little the answer by Martin Honnen:

<xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)"/>
    <xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != '  '">
        <xsl:text> </xsl:text>
    </xsl:if>
</xsl:template>

it tests if the last character is a space and the last 2 characters are not both spaces, if true, it inserts a space.

like image 170
Joel M. Lamsen Avatar answered Dec 24 '25 11:12

Joel M. Lamsen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!