Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath/XSLT Remove Empty Tags

Tags:

html

xml

xslt

xpath

I would like to remove tags which contain only whitespace/newline/tab chars, as below:

<p>    </p>

How would you do this using xpath functions and xslt templates?

like image 553
Kyle Avatar asked Oct 13 '11 05:10

Kyle


1 Answers

This transformation (overriding the identity rule):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[not(*) and not(text()[normalize-space()])]"/>
</xsl:stylesheet>

when applied to the following XML document:

<t>
 <a>
  <b>
    <c/>
  </b>
 </a>
 <p></p>
 <p>  </p>
 <p>Text</p>
</t>

correctly produces the wanted result:

<t>
   <a>
      <b/>
   </a>
   <p>Text</p>
</t>

Remember: Using and overriding the identity rule/template is the most fundamental and powerful XSLT design pattern. It is the right choice for a variety of problems where most of the nodes are to be copied unchanged and only some specific nodes need be altered, deleted, renamed, ..., etc.

Note: @Abel in his comment recommends that some bits of this solution need to be further explained:

For the uninitiated or curious: not(*) means: not having an child element; not(text()[normalize-space()]) means: not having a text-node with non - white-space-only text.

like image 128
Dimitre Novatchev Avatar answered Oct 16 '22 00:10

Dimitre Novatchev