Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT - How to keep only wanted elements from XML

Tags:

xml

xslt

xpath

I have a number of XML files containing lots of overhead. I wish to keep only about 20 specific elements and filter out anything else. I know all the names of the elements I want to keep, I also know whether or not they are child elements and who are their parents. These elements that I want to keep after the transformation need to still have their original hierarchic placement.

E.g. I want to keep ONLY

<ns:currency>

in;

<ns:stuff>
 <ns:things>
  <ns:currency>somecurrency</ns:currency>
  <ns:currency_code/>
  <ns:currency_code2/>
  <ns:currency_code3/>
  <ns:currency_code4/>
 </ns:things>
</ns:stuff>

And make it look like this;

<ns:stuff>
 <ns:things>
  <ns:currency>somecurrency</ns:currency>
 </ns:things>
</ns:stuff>

What would be the best way of constructing an XSLT to accomplish this?

like image 732
cc0 Avatar asked Apr 26 '11 12:04

cc0


2 Answers

This general transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ns="some:ns">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <ns:WhiteList>
  <name>ns:currency</name>
  <name>ns:currency_code3</name>
 </ns:WhiteList>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "*[not(descendant-or-self::*[name()=document('')/*/ns:WhiteList/*])]"/>
</xsl:stylesheet>

when applied on the provided XML document (with namespace definition added to make it well-formed):

<ns:stuff xmlns:ns="some:ns">
    <ns:things>
        <ns:currency>somecurrency</ns:currency>
        <ns:currency_code/>
        <ns:currency_code2/>
        <ns:currency_code3/>
        <ns:currency_code4/>
    </ns:things>
</ns:stuff>

produces the wanted result (white-listed elements and their structural relations are preserved):

<ns:stuff xmlns:ns="some:ns">
   <ns:things>
      <ns:currency>somecurrency</ns:currency>
      <ns:currency_code3/>
   </ns:things>
</ns:stuff>

Explanation:

  1. The identity rule/template copies all nodes "as-is".

  2. The stylesheet contains a top-level <ns:WhiteList> element whose <name> children specify all white-listed element's names -- the elements that are to be preserved with their structural relationships in the document.

  3. The <ns:WhiteList> element is best kept in a separate document so that the current stylesheet will not need to be edited with new names. Here the whitelist is in the same stylesheet just for convenience.

  4. One single template is overriding the identity template. It doesn't process (deletes) any element that is not white-listed and has no descendent that is white-listed.

like image 127
Dimitre Novatchev Avatar answered Sep 23 '22 14:09

Dimitre Novatchev


In XSLT you usually don't remove the elements you want to drop, but you copy the elements you want to keep:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ns="http://www.example.com/ns#"
    version="1.0">

    <xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>

     <xsl:template match="/ns:stuff">
        <xsl:copy>
            <xsl:apply-templates select="ns:things"/>
        </xsl:copy>
     </xsl:template>

     <xsl:template match="ns:things">
        <xsl:copy>
            <xsl:apply-templates select="ns:currency"/>
            <xsl:apply-templates select="ns:currency_code3"/>                   
        </xsl:copy>
     </xsl:template>

     <xsl:template match="ns:currency">
        <xsl:copy-of select="."/>
     </xsl:template>

     <xsl:template match="ns:currency_code3">
        <xsl:copy-of select="."/>
     </xsl:template>

</xsl:stylesheet>

The example above copies only currency and currency_code3. The output is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<ns:stuff xmlns:ns="http://www.example.com/ns#">
   <ns:things>
      <ns:currency>somecurrency</ns:currency>
      <ns:currency_code3/>
   </ns:things>
</ns:stuff>

Note: I added a namespace declaration for your prefix ns.

If you want to copy everything except a few elements, you may see this answer

like image 43
MarcoS Avatar answered Sep 22 '22 14:09

MarcoS