I'm trying to split a tree of elements based on the location of a descendent element. (In particular, I'm trying to parse Adobe's IDML.) I'd like to be able to convert a tree that looks like:
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bar">
<Content>foo</Content>
<Br />
<Content>bar</Content>
</CharacterStyleRange>
<CharacterStyleRange style="bop">
<Content>baz</Content>
<Br />
<Hyperlink>
<Content>boo</Content>
<Br />
<Content>meep</Content>
</Hyperlink>
</ParagraphStyleRange>
into split trees:
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bar">
<Content>foo</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bar">
<Content>bar</Content>
</CharacterStyleRange>
<CharacterStyleRange style="bop">
<Content>baz</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bop">
<Hyperlink>
<Content>boo</Content>
</Hyperlink>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bop">
<Hyperlink>
<Content>meep</Content>
</Hyperlink>
</CharacterStyleRange>
</ParagraphStyleRange>
which I can then parse using normal XSL. (EDIT: I originally showed the <Br/>
tags in their original place, but it doesn't really matter if they are there or not, since the information they contained is now represented by the split elements. I think it's probably easier to solve this problem without worrying about keeping them in.)
I tried using xsl:for-each-group
as suggested in the XSLT 2.0 spec (e.g. <xsl:for-each-group select="CharacterStyleRange/*" group-ending-with="Br">
), but I can't figure out how to apply that at every level of the tree (<Br />
tags can appear at any level, e.g. inside a <Hyperlink>
element inside of a <CharacterStyleRange>
element, and it also limits me to only having templates that apply at the chosen depth.
EDIT: My example code shows only one place where the tree needs to be split, but there can be any number of split points (always the same element, though.)
EDIT 2: I've added some a more detailed example, to show some of complications.
This XSLT 1.0 (and of course, also XSLT 2.0) transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="text()"/>
<xsl:template match="/">
<xsl:call-template name="Split">
<xsl:with-param name="pSplitters"
select="//Br"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="Split">
<xsl:param name="pSplitters"/>
<xsl:if test="$pSplitters">
<xsl:for-each select="$pSplitters[1]">
<xsl:call-template name="buildTree">
<xsl:with-param name="pLeafs" select=
"preceding-sibling::node()[not(descendant::Br)]"/>
</xsl:call-template>
<xsl:if test=
"not(following-sibling::node()//Br)">
<xsl:call-template name="buildTree">
<xsl:with-param name="pLeafs" select=
"following-sibling::node()"/>
</xsl:call-template>
</xsl:if>
<xsl:call-template name="Split">
<xsl:with-param name="pSplitters" select=
"$pSplitters[position() > 1]"/>
</xsl:call-template>
</xsl:for-each>
</xsl:if>
</xsl:template>
<xsl:template name="buildTree">
<xsl:param name="pAncestors" select="ancestor::*"/>
<xsl:param name="pLeafs"/>
<xsl:choose>
<xsl:when test="not($pAncestors)">
<xsl:copy-of select="$pLeafs"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vtopAncestor" select="$pAncestors[1]"/>
<xsl:element name="{name($vtopAncestor)}"
namespace="{namespace-uri($vtopAncestor)}">
<xsl:copy-of select=
"$vtopAncestor/namespace::* | $vtopAncestor/@*"/>
<xsl:call-template name="buildTree">
<xsl:with-param name="pAncestors"
select="$pAncestors[position()>1]"/>
<xsl:with-param name="pLeafs" select="$pLeafs"/>
</xsl:call-template>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bar">
<Content>foo</Content>
<Br />
<Content>bar</Content>
</CharacterStyleRange>
<CharacterStyleRange style="bop">
<Content>baz</Content>
<Br />
<Hyperlink>
<Content>boo</Content>
<Br />
<Content>meep</Content>
</Hyperlink>
</CharacterStyleRange>
</ParagraphStyleRange>
produces the wanted, correct result:
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bar">
<Content>foo</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bar">
<Content>bar</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bop">
<Content>baz</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bop">
<Hyperlink>
<Content>boo</Content>
</Hyperlink>
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
<CharacterStyleRange style="bop">
<Hyperlink>
<Content>meep</Content>
</Hyperlink>
</CharacterStyleRange>
</ParagraphStyleRange>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With