I would like to remove tags which contain only whitespace/newline/tab chars, as below:
<p> </p>
How would you do this using xpath functions and xslt templates?
This transformation (overriding the identity rule):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*) and not(text()[normalize-space()])]"/>
</xsl:stylesheet>
when applied to the following XML document:
<t>
<a>
<b>
<c/>
</b>
</a>
<p></p>
<p> </p>
<p>Text</p>
</t>
correctly produces the wanted result:
<t>
<a>
<b/>
</a>
<p>Text</p>
</t>
Remember: Using and overriding the identity rule/template is the most fundamental and powerful XSLT design pattern. It is the right choice for a variety of problems where most of the nodes are to be copied unchanged and only some specific nodes need be altered, deleted, renamed, ..., etc.
Note: @Abel in his comment recommends that some bits of this solution need to be further explained:
For the uninitiated or curious:
not(*)
means: not having an child element;not(text()[normalize-space()])
means: not having a text-node with non - white-space-only text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With