Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Elements and/or Attributes by Name per XSL Parameters

The following does the job of removing unwanted elements and attributes by name ("removeMe" in this example) from an XML file:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node() | @*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node() | @*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="removeMe"/>
</xsl:stylesheet>

The problems are it does not distinguish between elements and attributes, the name is hard-coded, and it can only take one name. How could this be rewritten to use a couple input parameters like below to remove one or more specific elements and/or attributes?

<xsl:param name="removeElementsNamed"/>
<xsl:param name="removeAttributesNamed"/>

The desired result is the ability to remove one or more elements and/or one or more attributes while still distinguishing between elements and attributes (in other words, it should be possible to remove all "time" elements without also removing all "time" attributes).

While I required XSLT 1.0 this round, XSLT 2.0 solutions in accepted and other answers may be useful to others.

like image 792
Witman Avatar asked Feb 21 '12 22:02

Witman


1 Answers

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="removeElementsNamed" select="'x'"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:if test="not(name() = $removeElementsNamed)">
   <xsl:call-template name="identity"/>
  </xsl:if>
 </xsl:template>
</xsl:stylesheet>

when applied on any XML document, say this:

<t>
    <a>
        <b/>
        <x/>
    </a>
    <c/>
    <x/>
    <d/>
</t>

produces the wanted correct result -- a copy of the source XML document in which any occurence of element having the name that is the value of the $removeElementsNamed parameter, is deleted:

<t>
   <a>
      <b/>
   </a>
   <c/>
   <d/>
</t>

Do note: In XSLT 1.0 it is syntactically illegal to have a variable or parameter reference inside a template match pattern. This is why the solutions by @Jan Thomä and @treeMonkey both raise an error with any XSLT 1.0 - compliant processor.

Update: Here is a more complicated solution, that allows a pipe-separated list of element names - to be deleted, to be passed to the transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="removeElementsNamed" select="'|x|c|'"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:if test=
   "not(contains($removeElementsNamed,
                 concat('|',name(),'|' )
                 )
        )
   ">
   <xsl:call-template name="identity"/>
  </xsl:if>
 </xsl:template>
</xsl:stylesheet>

When applied to the same XML document (above), the transformation produces again the wanted, correct output -- the source XML document with all elements whose name are specified in the $removeElementsNamed parameter -- deleted:

<t>
   <a>
      <b/>
   </a>
   <d/>
</t>

Update2: The same transformation as in Update1, but written in XSLT 2.0:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="removeElementsNamed" select="'|x|c|'"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[name() = tokenize($removeElementsNamed, '\|')]"/>
</xsl:stylesheet>

Update: The OP has added the requirement to also be able to delete all attributes that have some specific name.

Here is the slightly modified transformation to accomodate this new requirement:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:param name="removeElementsNamed" select="'x'"/>
     <xsl:param name="removeAttributesNamed" select="'n'"/>

     <xsl:template match="node()|@*" name="identity">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*">
      <xsl:if test="not(name() = $removeElementsNamed)">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>

     <xsl:template match="@*">
      <xsl:if test="not(name() = $removeAttributesNamed)">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the XML document below (the one used before but with a few attributes added):

<t>
    <a>
        <b m="1" n="2"/>
        <x/>
    </a>
    <c/>
    <x/>
    <d n="3"/>
</t>

the wanted, correct result is produced (all elements named x and all attributes named n are deleted):

<t>
   <a>
      <b m="1"/>
   </a>
   <c/>
   <d/>
</t>

UPDATE2: As again requested by the OP, we now implement the capability to pass pipe-separated list of names for the deletion of elements with these names and respectively a pipe-separated list of names for the deletion of attributes with these names:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:param name="removeElementsNamed" select="'|c|x|'"/>
     <xsl:param name="removeAttributesNamed" select="'|n|p|'"/>

     <xsl:template match="node()|@*" name="identity">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*">
      <xsl:if test=
      "not(contains($removeElementsNamed,
                    concat('|', name(), '|')
                    )
           )
      ">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>

     <xsl:template match="@*">
      <xsl:if test=
      "not(contains($removeAttributesNamed,
                    concat('|', name(), '|')
                    )
           )
       ">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document:

<t>
    <a p="0">
        <b m="1" n="2"/>
        <x/>
    </a>
    <c/>
    <x/>
    <d n="3"/>
</t>

the wanted, correct result is produced (elements with names c and x and attributes with names n and p are deleted):

<t>
   <a>
      <b m="1"/>
   </a>
   <d/>
</t>
like image 68
Dimitre Novatchev Avatar answered Sep 29 '22 11:09

Dimitre Novatchev