If the contents of a citations node is something like the following:
<p>
WAJWAJADS:
</p>
<p>
asdf
</p>
<p>
ALSOAS:
</p>
<p>
lorem ipsum...<br />
lorem<br />
blah blah <i>
adfas & dasdsaafs
</i>, April 2011.<br />
lorem lorem dear lord the whitespace
</p>
Is there any way to transform this to properly formatted HTML with XSLT?
normalize-space() just concats everything together. The best I've managed to do is normalize-space() on all p descendants within a for-each loop and wrap them in a p element. However, then any inner tags are still lost.
Is there a better way to parse this WYSIWYG generated trainwreck? Unfortunately I have no control over the generated XML.
I've modified a little the answer by Martin Honnen:
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
<xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != ' '">
<xsl:text> </xsl:text>
</xsl:if>
</xsl:template>
it tests if the last character is a space and the last 2 characters are not both spaces, if true, it inserts a space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With