Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT split a tree at a descendent node

Tags:

xslt

I'm trying to split a tree of elements based on the location of a descendent element. (In particular, I'm trying to parse Adobe's IDML.) I'd like to be able to convert a tree that looks like:

<ParagraphStyleRange style="foo">
 <CharacterStyleRange style="bar">
  <Content>foo</Content>
  <Br />
  <Content>bar</Content>
 </CharacterStyleRange>
 <CharacterStyleRange style="bop">
  <Content>baz</Content>
  <Br />
  <Hyperlink>
   <Content>boo</Content>
    <Br />
   <Content>meep</Content>
  </Hyperlink>
</ParagraphStyleRange>

into split trees:

<ParagraphStyleRange style="foo">
 <CharacterStyleRange style="bar">
  <Content>foo</Content>
 </CharacterStyleRange>
</ParagraphStyleRange>

<ParagraphStyleRange style="foo">
 <CharacterStyleRange style="bar">
  <Content>bar</Content>
 </CharacterStyleRange>
 <CharacterStyleRange style="bop">
  <Content>baz</Content>
 </CharacterStyleRange>
</ParagraphStyleRange>

<ParagraphStyleRange style="foo">
 <CharacterStyleRange style="bop">
  <Hyperlink>
   <Content>boo</Content>
  </Hyperlink>
 </CharacterStyleRange>
</ParagraphStyleRange>

<ParagraphStyleRange style="foo">
 <CharacterStyleRange style="bop">
  <Hyperlink>
   <Content>meep</Content>
  </Hyperlink>
 </CharacterStyleRange>
</ParagraphStyleRange>

which I can then parse using normal XSL. (EDIT: I originally showed the <Br/> tags in their original place, but it doesn't really matter if they are there or not, since the information they contained is now represented by the split elements. I think it's probably easier to solve this problem without worrying about keeping them in.)

I tried using xsl:for-each-group as suggested in the XSLT 2.0 spec (e.g. <xsl:for-each-group select="CharacterStyleRange/*" group-ending-with="Br">), but I can't figure out how to apply that at every level of the tree (<Br /> tags can appear at any level, e.g. inside a <Hyperlink> element inside of a <CharacterStyleRange> element, and it also limits me to only having templates that apply at the chosen depth.

EDIT: My example code shows only one place where the tree needs to be split, but there can be any number of split points (always the same element, though.)

EDIT 2: I've added some a more detailed example, to show some of complications.

like image 560
Quentin Smith Avatar asked Feb 26 '11 19:02

Quentin Smith


1 Answers

This XSLT 1.0 (and of course, also XSLT 2.0) transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
       <xsl:output omit-xml-declaration="yes" indent="yes"/>
       <xsl:strip-space elements="*"/>

       <xsl:template match="text()"/>

       <xsl:template match="/">
         <xsl:call-template name="Split">
          <xsl:with-param name="pSplitters"
           select="//Br"/>
         </xsl:call-template>
       </xsl:template>

       <xsl:template name="Split">
         <xsl:param name="pSplitters"/>

         <xsl:if test="$pSplitters">
           <xsl:for-each select="$pSplitters[1]">

             <xsl:call-template name="buildTree">
              <xsl:with-param name="pLeafs" select=
              "preceding-sibling::node()[not(descendant::Br)]"/>
          </xsl:call-template>

          <xsl:if test=
            "not(following-sibling::node()//Br)">
                 <xsl:call-template name="buildTree">
                  <xsl:with-param name="pLeafs" select=
                  "following-sibling::node()"/>
                 </xsl:call-template>
          </xsl:if>

          <xsl:call-template name="Split">
            <xsl:with-param name="pSplitters" select=
             "$pSplitters[position() > 1]"/>
          </xsl:call-template>
          </xsl:for-each>
         </xsl:if>
       </xsl:template>

 <xsl:template name="buildTree">
  <xsl:param name="pAncestors" select="ancestor::*"/>
  <xsl:param name="pLeafs"/>

  <xsl:choose>
    <xsl:when test="not($pAncestors)">
     <xsl:copy-of select="$pLeafs"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="vtopAncestor" select="$pAncestors[1]"/>

      <xsl:element name="{name($vtopAncestor)}"
           namespace="{namespace-uri($vtopAncestor)}">
        <xsl:copy-of select=
             "$vtopAncestor/namespace::* | $vtopAncestor/@*"/>
        <xsl:call-template name="buildTree">
          <xsl:with-param name="pAncestors"
               select="$pAncestors[position()>1]"/>
          <xsl:with-param name="pLeafs" select="$pLeafs"/>
        </xsl:call-template>
      </xsl:element>
     </xsl:otherwise>
  </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<ParagraphStyleRange style="foo">
    <CharacterStyleRange style="bar">
        <Content>foo</Content>
        <Br />
        <Content>bar</Content>
    </CharacterStyleRange>
    <CharacterStyleRange style="bop">
        <Content>baz</Content>
        <Br />
        <Hyperlink>
            <Content>boo</Content>
            <Br />
            <Content>meep</Content>
        </Hyperlink>
    </CharacterStyleRange>
</ParagraphStyleRange>

produces the wanted, correct result:

<ParagraphStyleRange style="foo">
   <CharacterStyleRange style="bar">
      <Content>foo</Content>
   </CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
   <CharacterStyleRange style="bar">
      <Content>bar</Content>
   </CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
   <CharacterStyleRange style="bop">
      <Content>baz</Content>
   </CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
   <CharacterStyleRange style="bop">
      <Hyperlink>
         <Content>boo</Content>
      </Hyperlink>
   </CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange style="foo">
   <CharacterStyleRange style="bop">
      <Hyperlink>
         <Content>meep</Content>
      </Hyperlink>
   </CharacterStyleRange>
</ParagraphStyleRange>
like image 97
Dimitre Novatchev Avatar answered Sep 28 '22 19:09

Dimitre Novatchev