I am struggling with the concept of grouping (on multiple keys) of table based XML to hierarchy with XSLT
The grouping is based on first four elements, however the grouping must break if there is another element in between the set.
Source XML:
<RECORDS>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F1>A1</F1>
</RECORD>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F1>A2</F1>
</RECORD>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>195</E5>
<F1>A3</F1>
</RECORD>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F1>A4</F1>
</RECORD>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F1>A5</F1>
</RECORD>
<RECORD>
<E1>DONALD</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F1>A6</F1>
</RECORD>
<RECORD>
<E1>DONALD</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F1>A7</F1>
</RECORD>
</RECORDS>
Output XML
<RECORDS>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F>
<F1>A1</F1>
<F1>A2</F1>
</F>
</RECORD>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>195</E5>
<F>
<F1>A3</F1>
<F1>A4</F1>
</F>
</RECORD>
<RECORD>
<E1>MICKEY</E1> <!--Must break and not merge in first group -->
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F>
<F1>A5</F1>
</F>
</RECORD>
<RECORD>
<E1>DONALD</E1>
<E2>TEST</E2>
<E4>14</E4>
<E5>196</E5>
<F>
<F1>A6</F1>
<F1>A7</F1>
</F>
</RECORD>
</RECORDS>
Here is the XSL I have come up with so far...
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="grouped" match="RECORD"
use="concat(E1, '+', E2, '+', E4 , '+', E5 )"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<RECORDS>
<xsl:apply-templates select=
"RECORD[generate-id()
=
generate-id(key('grouped',
concat(E1, '+', E2, '+', E4 , '+', E5 )
)
[1]
)
]
"/>
</RECORDS>
</xsl:template>
<xsl:template match="RECORD">
<RECORD>
<E1><xsl:value-of select="E1"/></E1>
<E2><xsl:value-of select="E2"/></E2>
<E4><xsl:value-of select="E4"/></E4>
<F>
<xsl:for select="F1">
<F1><xsl:value-of select="F1"/></F1>
</xsl:for>
</F>
</RECORD>
</xsl:template>
</xsl:stylesheet>
The issue is that I am unable to generate the inner tag reapeating for each f1. Also I should get 4 set of RECORDS, not 3 that I get with this.
<RECORDS>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<F></F>
</RECORD>
<RECORD>
<E1>MICKEY</E1>
<E2>TEST</E2>
<E4>14</E4>
<F></F>
</RECORD>
<RECORD>
<E1>DONALD</E1>
<E2>TEST</E2>
<E4>14</E4>
<F></F>
</RECORD>
</RECORDS>
Define a key for the property we want to use for grouping. Select all of the nodes we want to group. We'll do some tricks with the key() and generate-id() functions to find the unique grouping values. For each unique grouping value, use the key() function to retrieve all nodes that match it.
The value used to select the items in the current group. The current-grouping-key() function is only useful inside an <xsl:for-each-group> element with a group-by or group-adjacent attribute. Calling current-grouping-key() anywhere else returns the empty sequence.
Selects a sequence of nodes and/or atomic values and organizes them into subsets called groups. Available in XSLT 2.0 and later versions. Available in all Saxon editions.
Here is a solution using keys. Shorter (28% less lines of code, and not requiring horizontal scrolling). More Robust (see the end of this answer for details)
It is more general, because it will work even in the case where in-between the elements we want to group, there are other elements that must be ignored (that is where preceding-sibling::*[1]
may be an element we want excluded from grouping -- in the current problem -- not a RECORD
element):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kStartGroup" match="/*/*" use=
"generate-id(preceding-sibling::*
[not(concat(E1, '|', E2, '|', E4, '|', E5)
= concat(current()/E1, '|', current()/E2, '|', current()/E4, '|', current()/E5)
)
][1])"/>
<xsl:template match="*[not(concat(E1, '|', E2, '|', E4, '|', E5)
=
concat(preceding-sibling::*[1]/E1, '|',
preceding-sibling::*[1]/E2, '|',
preceding-sibling::*[1]/E4, '|',
preceding-sibling::*[1]/E5)
)]">
<xsl:copy>
<xsl:copy-of select="E1 | E2 | E4 | E5"/>
<F><xsl:copy-of select=
"key('kStartGroup', generate-id(preceding-sibling::*[1]))/F1"/></F>
</xsl:copy>
</xsl:template>
<xsl:template match="/*"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Robustness / Scalability
Because this transformation doesn't contain recursion (nested calls to <xsl:apply-templates
), it is robust and scalable when applied on large XML files.
On the other side, the provided in another answer "siblings recursion" solution crashes due to stack-overflow when the transformation is applied on sufficiently-large XML document. In my case this crash was observed with source XML document of about 13 000 (13 thousand lines) -- this may vary depending on available RAM, XSLT processor, etc.
The current transformation executes successfully even on extremely large XML documents -- such as having 1 200 000 (one million and 200 thousand lines).
Apparently you want to do in XSLT 1.0 the equivalent of XSLT 2.0's group-adjacent
. This can be achieved using a technique known as "sibling recursion":
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/RECORDS">
<xsl:copy>
<!-- start the first group -->
<xsl:apply-templates select="RECORD[1]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="RECORD">
<xsl:variable name="key" select="concat(E1, '+', E2, '+', E4 , '+', E5)" />
<xsl:copy>
<xsl:copy-of select="E1 | E2 | E4 | E5"/>
<F>
<xsl:copy-of select="F1"/>
<!-- immediate sibling in the same group -->
<xsl:apply-templates select="following-sibling::RECORD[1][concat(E1, '+', E2, '+', E4 , '+', E5) = $key]" mode="collect"/>
</F>
</xsl:copy>
<!-- start the next group -->
<xsl:apply-templates select="following-sibling::RECORD[not(concat(E1, '+', E2, '+', E4 , '+', E5)=$key)][1]"/>
</xsl:template>
<xsl:template match="RECORD" mode="collect">
<xsl:variable name="key" select="concat(E1, '+', E2, '+', E4 , '+', E5)" />
<xsl:copy-of select="F1"/>
<!-- immediate sibling in the same group -->
<xsl:apply-templates select="following-sibling::RECORD[1][concat(E1, '+', E2, '+', E4 , '+', E5) = $key]" mode="collect" />
</xsl:template>
</xsl:stylesheet>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With