Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping on elements in XML transformation with XSLT

I am struggling with the concept of grouping (on multiple keys) of table based XML to hierarchy with XSLT

The grouping is based on first four elements, however the grouping must break if there is another element in between the set.

Source XML:

<RECORDS> 
<RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A1</F1>
</RECORD>
<RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A2</F1>
</RECORD>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>195</E5>
    <F1>A3</F1>
  </RECORD>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A4</F1>
  </RECORD>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A5</F1>
  </RECORD>
     <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A6</F1>
  </RECORD>
 <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A7</F1>
  </RECORD>
 </RECORDS>

Output XML

 <RECORDS>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F>
     <F1>A1</F1>
     <F1>A2</F1>
    </F>
  </RECORD>
  <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>195</E5>
    <F>
     <F1>A3</F1>
     <F1>A4</F1>
    </F>
  </RECORD>
  <RECORD>
   <E1>MICKEY</E1> <!--Must break and not merge in first group -->
   <E2>TEST</E2>
   <E4>14</E4>
   <E5>196</E5>
   <F>   
   <F1>A5</F1>
   </F>
  </RECORD>
  <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F>
     <F1>A6</F1>
     <F1>A7</F1>
    </F>
  </RECORD>
 </RECORDS>

Here is the XSL I have come up with so far...

<?xml version="1.0"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
 <xsl:key name="grouped" match="RECORD"
  use="concat(E1, '+', E2, '+', E4 , '+', E5 )"/>

<xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
 <xsl:template match="/*">
  <RECORDS>
   <xsl:apply-templates select=
   "RECORD[generate-id()
          =
           generate-id(key('grouped',
                        concat(E1, '+', E2, '+', E4 , '+', E5 )
                          )
                           [1]
                      )
           ]
   "/>
  </RECORDS>
 </xsl:template>
 <xsl:template match="RECORD">
   <RECORD>
  <E1><xsl:value-of select="E1"/></E1>
<E2><xsl:value-of select="E2"/></E2>
<E4><xsl:value-of select="E4"/></E4>
<F>
<xsl:for select="F1">
<F1><xsl:value-of select="F1"/></F1>
</xsl:for>

</F>
   </RECORD>

</xsl:template>
</xsl:stylesheet>

The issue is that I am unable to generate the inner tag reapeating for each f1. Also I should get 4 set of RECORDS, not 3 that I get with this.

<RECORDS>
  <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <F></F>
  </RECORD>
  <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <F></F>
  </RECORD>
  <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <F></F>
  </RECORD>
</RECORDS>
like image 288
ggonsalv Avatar asked May 05 '16 10:05

ggonsalv


People also ask

How do I group data in XSLT?

Define a key for the property we want to use for grouping. Select all of the nodes we want to group. We'll do some tricks with the key() and generate-id() functions to find the unique grouping values. For each unique grouping value, use the key() function to retrieve all nodes that match it.

What is current grouping key in XSLT?

The value used to select the items in the current group. The current-grouping-key() function is only useful inside an <xsl:for-each-group> element with a group-by or group-adjacent attribute. Calling current-grouping-key() anywhere else returns the empty sequence.

What is for each group in XSLT?

Selects a sequence of nodes and/or atomic values and organizes them into subsets called groups. Available in XSLT 2.0 and later versions. Available in all Saxon editions.


2 Answers

Here is a solution using keys. Shorter (28% less lines of code, and not requiring horizontal scrolling). More Robust (see the end of this answer for details)

It is more general, because it will work even in the case where in-between the elements we want to group, there are other elements that must be ignored (that is where preceding-sibling::*[1] may be an element we want excluded from grouping -- in the current problem -- not a RECORD element):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:key name="kStartGroup" match="/*/*" use=
    "generate-id(preceding-sibling::*
      [not(concat(E1, '|', E2, '|', E4, '|', E5)
          = concat(current()/E1, '|', current()/E2, '|', current()/E4, '|', current()/E5)
          )
      ][1])"/>
  <xsl:template match="*[not(concat(E1, '|', E2, '|', E4, '|', E5) 
                            = 
                              concat(preceding-sibling::*[1]/E1, '|', 
                                     preceding-sibling::*[1]/E2, '|', 
                                     preceding-sibling::*[1]/E4, '|',
                                     preceding-sibling::*[1]/E5)
                             )]">
    <xsl:copy>
      <xsl:copy-of select="E1 | E2 | E4 | E5"/>
      <F><xsl:copy-of select=
                      "key('kStartGroup', generate-id(preceding-sibling::*[1]))/F1"/></F>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="/*"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template>      
  <xsl:template match="text()"/>
</xsl:stylesheet>

Robustness / Scalability

Because this transformation doesn't contain recursion (nested calls to <xsl:apply-templates), it is robust and scalable when applied on large XML files.

On the other side, the provided in another answer "siblings recursion" solution crashes due to stack-overflow when the transformation is applied on sufficiently-large XML document. In my case this crash was observed with source XML document of about 13 000 (13 thousand lines) -- this may vary depending on available RAM, XSLT processor, etc.

The current transformation executes successfully even on extremely large XML documents -- such as having 1 200 000 (one million and 200 thousand lines).

like image 148
Dimitre Novatchev Avatar answered Sep 19 '22 17:09

Dimitre Novatchev


Apparently you want to do in XSLT 1.0 the equivalent of XSLT 2.0's group-adjacent. This can be achieved using a technique known as "sibling recursion":

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/RECORDS">
    <xsl:copy>
        <!-- start the first group -->
        <xsl:apply-templates select="RECORD[1]"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="RECORD">
    <xsl:variable name="key" select="concat(E1, '+', E2, '+', E4 , '+', E5)" />
    <xsl:copy>
        <xsl:copy-of select="E1 | E2 | E4 | E5"/>
        <F>
            <xsl:copy-of select="F1"/>
            <!-- immediate sibling in the same group -->
            <xsl:apply-templates select="following-sibling::RECORD[1][concat(E1, '+', E2, '+', E4 , '+', E5) = $key]" mode="collect"/>
        </F>
    </xsl:copy>
    <!-- start the next group -->
    <xsl:apply-templates select="following-sibling::RECORD[not(concat(E1, '+', E2, '+', E4 , '+', E5)=$key)][1]"/>
</xsl:template>

<xsl:template match="RECORD" mode="collect">
    <xsl:variable name="key" select="concat(E1, '+', E2, '+', E4 , '+', E5)" />
    <xsl:copy-of select="F1"/>
    <!-- immediate sibling in the same group -->
    <xsl:apply-templates select="following-sibling::RECORD[1][concat(E1, '+', E2, '+', E4 , '+', E5) = $key]" mode="collect" />
</xsl:template> 

</xsl:stylesheet>
like image 40
michael.hor257k Avatar answered Sep 16 '22 17:09

michael.hor257k