Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split and flatten nodes with XSLT

I cannot have any nested spans, so I need to flatten them and concatenate their class attributes so I can track which classes are parents.

Here's a simplified input:

<body>
    <h1 class="section">Title</h1>
    <p class="main">
        ZZZ
        <span class="a">
            AAA
            <span class="b">
                BBB
                <span class="c">
                    CCC
                    <preserveMe>
                        eeee
                    </preserveMe>
                </span>
                bbb
                <preserveMe>
                    eeee
                </preserveMe>
            </span>
            aaa
        </span>
    </p>
</body>

Here's the desired output

<body>
    <h1 class="section">Title</h1>
    <p class="main">
        ZZZ
        <span class="a">
            AAA
        </span>
        <span class="ab">
            BBB
        </span>
        <span class="abc">
            CCC
            <preserveMe>
                eeee
            </preserveMe>
        </span>
        <span class="ab">
            bbb
            <preserveMe>
                eeee
            </preserveMe>
        </span>
        <span class="a">
            aaa
        </span>
    </p>
</body>

Here's the closest I've come (I'm really new to this, so even getting this far took me a long time...)

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <p>
            <xsl:apply-templates/>
        </p>
    </xsl:template>

    <xsl:template match="*/span">
      <span class='{concat(../../@class,../@class,@class)}'>
           <xsl:value-of select='.'/>
       </span>
       <xsl:apply-templates/>
    </xsl:template>

</xsl:stylesheet>

You can see the result of my failed attempt and how far it is from what I really wanted if you run it yourself. Ideally, I'd like a solution that accepts an arbitrary number of nested levels and can also handle interrupted nests (span, span, notSpan, span...).

edit: I have added tags inside the nested structure per request by commenters below. Also, I'm using XSLT v1.0, but I could use other versions if needed I suppose.

edit 2: I realized that my example was over-simlified compared to what I actually need to convert. Namely, I cannot lose classes from other tags; only spans can be combined.

like image 811
DocBuckets Avatar asked Apr 17 '15 23:04

DocBuckets


1 Answers

As I mentioned in the opening comments, this is far from being trivial. Here's another approach you may consider:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="p">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()|.//span/text()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="span/text()">
    <span>
        <xsl:attribute name="class">
            <xsl:for-each select="ancestor::span">
                <xsl:value-of select="@class"/>
            </xsl:for-each>
        </xsl:attribute>
        <xsl:apply-templates select="preceding-sibling::*"/>
        <xsl:value-of select="." />
        <xsl:if test="not(following-sibling::text())">
            <xsl:apply-templates select="following-sibling::*"/>
        </xsl:if>
    </span>     
</xsl:template>

<xsl:template match="span"/>

</xsl:stylesheet>

This is to a large extent similar to what was suggested earlier by Lingamurthy CS - but you will see a difference with the following test input:

XML

<body>
    <h1 class="section">Title</h1>
    <p class="main">
        ZZZ
        <preserveMe>0</preserveMe>
        <span class="a">
            AAA
            <span class="b">
                BBB
                <span class="c">
                    CCC
                    <preserveMe>c</preserveMe>
                </span>
                bbb
                <preserveMe>b</preserveMe>
            </span>
            aaa
        </span>
        <preserveMe>1</preserveMe>
    </p>
</body>
like image 70
michael.hor257k Avatar answered Nov 15 '22 10:11

michael.hor257k