Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct values with XSLT 1.0 when XPath has multiple criteria

Yet another question about getting distinct values using XSLT 1.0. Here's a stupid, made-up example that should illustrate my problem.

<?xml version="1.0" encoding="UTF-8"?>
<moviesByYear>
    <year1994>
        <movie>
            <genre>Action</genre>
            <director>A</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Comedy</genre>
            <director>A</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Drama</genre>
            <director>B</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Thriller</genre>
            <director>C</director>
        </movie>
    </year1994>
    <year1995>
        <movie>
            <genre>Action</genre>
            <director>A</director>
        </movie>
    </year1995>
    <year1995>
        <movie>
            <genre>Comedy</genre>
            <director>C</director>
        </movie>
    </year1995>
    <year1996>
        <movie>
            <genre>Thriller</genre>
            <director>A</director>
        </movie>
    </year1996>
</moviesByYear>

Now let's say that I'd like to list all years that produced movies that are either comedies or directed by director B. I use the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="text" encoding="UTF-8" indent="no"/>
    <xsl:template match="/">
        <xsl:for-each select="/moviesByYear/*[movie/genre='Comedy' or movie/director='B']">
            <xsl:value-of select="name()"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

This gives me the following output:

year1994year1994year1995

I have not yet found any solution for getting distinct values that would work here. For example, using name(.) != name(following-sibling::*) causes year1994 to be excluded altogether.

In my real-world case I have a complex XML structure and an XPath with many criteria that picks out a number of nodes, from which I need to get an output of distinct node names.

Update: michael.hor257k gave an elegant solution to this, but using it I faced a problem with xsl:key. Allow me to alter the scenario a bit:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <genres>
        <genre>Action</genre>
        <genre>Comedy</genre>
        <genre>Drama</genre>
        <genre>Thriller</genre>
    </genres>
    <moviesByYear>
        <year1994>
            <movie>
                <genre>Action</genre>
                <director>A</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Comedy</genre>
                <director>A</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Drama</genre>
                <director>B</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Thriller</genre>
                <director>C</director>
            </movie>
        </year1994>
        <year1995>
            <movie>
                <genre>Action</genre>
                <director>A</director>
            </movie>
        </year1995>
        <year1995>
            <movie>
                <genre>Comedy</genre>
                <director>C</director>
            </movie>
        </year1995>
        <year1996>
            <movie>
                <genre>Thriller</genre>
                <director>A</director>
            </movie>
        </year1996>
    </moviesByYear>
</root>

Now let's say that I want a list of genres, each of which lists years that produced movies of that genre or movies directed by director B. Stylesheet:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="urn:schemas-microsoft-com:xslt"
extension-element-prefixes="exsl">
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>

<xsl:template match="/">
    <xsl:for-each select="/root/genres/genre">
        <xsl:call-template name="output">
            <xsl:with-param name="genre">
                <xsl:value-of select="."/>
            </xsl:with-param>
        </xsl:call-template>
    </xsl:for-each>
</xsl:template>

<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="year" use="." />

<xsl:template name="output">
    <xsl:param name="genre"/>

    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="/root/moviesByYear/*/movie[genre=$genre or director=$director]"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />

    <!-- final pass -->
    <xsl:value-of select="concat($genre, ': ')"/> 
    <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
        <xsl:value-of select="."/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>

</xsl:template>

</xsl:stylesheet>

This produces the following output:

Action: year1994year1995
Comedy: 
Drama: 
Thriller: year1996

As you can see, each year is listed only once. The desired output would have been:

Action: year1994year1995
Comedy: year1994year1995
Drama: year1994
Thriller: year1994year1996
like image 611
Ingo88 Avatar asked Dec 15 '22 22:12

Ingo88


1 Answers

Here's a different implementation of Muenchian grouping - one that allows you to parametrize the criteria by which the movies are selected.

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="genre" select="'Comedy'"/>
<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="year" use="." />

<xsl:template match="/">

    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="moviesByYear/*/movie[genre=$genre or director=$director]"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />

    <!-- final pass -->
    <output>
        <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </output>

</xsl:template>

</xsl:stylesheet>

When the above is applied to your example input, the result is:

<?xml version="1.0" encoding="UTF-8"?>
<output>
   <year>year1994</year>
   <year>year1995</year>
</output>

Edit:

With regard to your modified input, I believe I would do it this way:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="director" select="'B'"/>

<xsl:key name="movies-by-genre" match="movie" use="genre" />
<xsl:key name="movies-by-director" match="movie" use="director" />
<xsl:key name="year" match="year" use="." />

<xsl:template match="/">
    <output>
        <xsl:apply-templates select="root/genres/genre"/>
    </output>
</xsl:template>

<xsl:template match="genre">
    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="key('movies-by-genre', .) | key('movies-by-director', $director)"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />
    <!-- final pass -->
    <genre name="{.}">
        <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </genre>
</xsl:template>

</xsl:stylesheet>

The result here is:

<?xml version="1.0" encoding="UTF-8"?>
<output>
   <genre name="Action">
      <year>year1994</year>
      <year>year1995</year>
   </genre>
   <genre name="Comedy">
      <year>year1994</year>
      <year>year1995</year>
   </genre>
   <genre name="Drama">
      <year>year1994</year>
   </genre>
   <genre name="Thriller">
      <year>year1994</year>
      <year>year1996</year>
   </genre>
</output>

Note: the two added keys are for efficiency only - they are not required for the main purpose here.


Edit 2:

On second thought, we could do this all in a single pass, thus (hopefully) avoiding the issues Xalan and MSXSML have with processing a variable - but still using Muenchian grouping:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="moviesByYear/*" use="local-name()" />

<xsl:template match="/">
    <output>
        <xsl:apply-templates select="root/genres/genre"/>
    </output>
</xsl:template>

<xsl:template match="genre">
    <xsl:variable name="genre" select="." />
    <genre name="{$genre}">
        <xsl:for-each select="../../moviesByYear/* 
        [count(. | key('year', local-name())[1]) = 1]
        [key('year', local-name())/movie[genre=$genre or director=$director]]">
            <year>
                <xsl:value-of select="local-name()"/>
            </year>  
        </xsl:for-each>
    </genre>
</xsl:template>

</xsl:stylesheet>
like image 175
michael.hor257k Avatar answered Jan 01 '23 10:01

michael.hor257k