Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSL - How to match consecutive comma-separated tags

Tags:

xslt

xpath

I'm trying to match a series of xml tags that are comma separated, and to then apply an xslt transformation on the whole group of nodes plus text. For example, given the following partial XML:

<p>Some text here
    <xref id="1">1</xref>,
    <xref id="2">2</xref>,
    <xref id="3">3</xref>.
</p>

I would like to end up with:

<p>Some text here <sup>1,2,3</sup>.</p>

A much messier alternate would also be acceptable at this point:

<p>Some text here <sup>1</sup><sup>,</sup><sup>2</sup><sup>,</sup><sup>3</sup>.</p>

I have the transformation to go from a single xref to a sup:

<xsl:template match="xref"">
    <sup><xsl:apply-templates/></sup>
</xsl:template>

But I'm at a loss as to how to match a group of nodes separated by commas.

Thanks.

like image 539
Steven Grosmark Avatar asked Dec 27 '22 18:12

Steven Grosmark


1 Answers

Update: Thanks to @Flynn1179 who alerted me that the solution wasn't producing exactly the wanted output, I have slightly modified it. Now the wanted "good" format is produced.

This XSLT 1.0 transformation:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()[1]|@*"/>
      </xsl:copy>
      <xsl:apply-templates select="following-sibling::node()[1]"/>
     </xsl:template>

     <xsl:template match=
     "xref[not(preceding-sibling::node()[1]
                  [self::text() and starts-with(.,',')]
               )
          ]">

      <xsl:variable name="vBreakText" select=
      "following-sibling::text()[not(starts-with(.,','))][1]"/>

      <xsl:variable name="vPrecedingTheBreak" select=
       "$vBreakText/preceding-sibling::node()"/>

      <xsl:variable name="vFollowing" select=
      ".|following-sibling::node()"/>

      <xsl:variable name="vGroup" select=
      "$vFollowing[count(.|$vPrecedingTheBreak)
                  =
                   count($vPrecedingTheBreak)
                  ]
      "/>

      <sup>
       <xsl:apply-templates select="$vGroup" mode="group"/>
      </sup>
      <xsl:apply-templates select="$vBreakText"/>
     </xsl:template>

     <xsl:template match="text()" mode="group">
       <xsl:value-of select="normalize-space()"/>
     </xsl:template>
</xsl:stylesheet>

when applied on the following XML document (based on the provided one, but made more complex and interesting):

<p>Some text here    
    <xref id="1">1</xref>,    
    <xref id="2">2</xref>,    
    <xref id="3">3</xref>.
    <ttt/>
    <xref id="4">4</xref>,
    <xref id="5">5</xref>,
    <xref id="6">6</xref>.
    <zzz/>
</p>

produces exactly the wanted, correct result:

<p>Some text here        
    <sup>1,2,3</sup>.    
    <ttt/>
    <sup>4,5,6</sup>.    
    <zzz/>
</p>

Explanation:

  1. We use a "fined-grained" identity rule, which processes the document node-by node in document order and copies the matched node "as-is"

  2. We override the identity rule with a template that matches any xref element that is the first in a group of xref elements, each of which (but the last one) is followed by an immediate text-node-sibling that starts with the ',' character. Here we find the first text-node-sibling that breaks the rule (its starting character isn't ','.

  3. Then we find all the nodes in the group, using the Kayessian (after @Michael Kay) formula for the intersection of two nodesets. This formula is: $ns1[count(.|$ns2) = count($ns2)]

  4. Then we process all nodes in the group in a mode named "group".

  5. Finally, we apply templates (in anonymous mode) to the breaking text node (that is the first node following the group), so that the chain of processing continues.

like image 188
Dimitre Novatchev Avatar answered Jan 01 '23 10:01

Dimitre Novatchev