Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting unique records in XSLT/XPath

I have to select only unique records from an XML document, in the context of an <xsl:for-each> loop. I am limited by Visual Studio to using XSL 1.0.

    <availList>
        <item>
          <schDate>2010-06-24</schDate>              
          <schFrmTime>10:00:00</schFrmTime>
          <schToTime>13:00:00</schToTime>
          <variousOtherElements></variousOtherElements>
        </item>
        <item>
          <schDate>2010-06-24</schDate>              
          <schFrmTime>10:00:00</schFrmTime>
          <schToTime>13:00:00</schToTime>
          <variousOtherElements></variousOtherElements>
        </item>
        <item>
          <schDate>2010-06-25</schDate>              
          <schFrmTime>10:00:00</schFrmTime>
          <schToTime>12:00:00</schToTime>
          <variousOtherElements></variousOtherElements>
        </item>
        <item>
          <schDate>2010-06-26</schDate>              
          <schFrmTime>13:00:00</schFrmTime>
          <schToTime>14:00:00</schToTime>
          <variousOtherElements></variousOtherElements>
        </item>
        <item>
          <schDate>2010-06-26</schDate>              
          <schFrmTime>10:00:00</schFrmTime>
          <schToTime>12:00:00</schToTime>
          <variousOtherElements></variousOtherElements>
        </item>
    </availList>

The uniqueness must be based on the value of the three child elements: schDate, schFrmTime and schToTime. If two item elements have the same values for all three child elements, they are duplicates. In the above XML, items one and two are duplicates. The rest are unique. As indicated above, each item contains other elements that we do not wish to include in the comparison. 'Uniqueness' should be a factor of those three elements, and those alone.

I have attempted to accomplish this through the following:

availList/item[not(schDate = preceding:: schDate and schFrmTime = preceding:: schFrmTime and schToTime = preceding:: schToTime)]

The idea behind this is to select records where there is no preceding element with the same schDate, schFrmTime and schToTime. However, its output is missing the last item. This is because my XPath is actually excluding items where all of the child element values are matched within the entire preceding document. No single item matches all of the last item's child elements - but because each element's value is individually present in another item, the last item gets excluded.

I could get the correct result by comparing all child values as a concatenated string to the same concatenated values for each preceding item. Does anybody know of a way I could do this?

like image 324
Daniel Situnayake Avatar asked Jun 10 '10 17:06

Daniel Situnayake


People also ask

How do I find unique values in XSLT?

The fn:distinct-values function returns a sequence of unique atomic values from $arg . Values are compared based on their typed value. Values of different numeric types may be equal, for example the xs:integer value 1 is equal to the xs:decimal value 1.0, so the function only returns one of these values.

How can I get distinct values in XPath?

distinct-values() is available in XPath 2.0. Are you using that? If distinct-values() is not available, the standard way of getting distinct values is to use not(@result = preceding:: @result) to get unique @result. It will give you the first occurrence only.

Does XSLT use XPath?

XSLT uses XPath to find information in an XML document. XPath is used to navigate through elements and attributes in XML documents. In the transformation process, XSLT uses XPath to define parts of the source document that should match one or more predefined templates.


2 Answers

I. As a single XPath expression:

/*/item[normalize-space() and not(. = preceding-sibling::item)]

II. More efficient (XSLT) implementation, using keys:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kItemByVal" match="item" use="."/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "*/item[generate-id() = generate-id(key('kItemByVal', .))]
   "/>
 </xsl:template>
</xsl:stylesheet>

Both I and II, when applied on the provided XML document correctly select/copy the following nodes:

<item><schDate>2010-06-24</schDate><schFrmTime>10:00:00</schFrmTime><schToTime>13:00:00</schToTime></item>
<item><schDate>2010-06-25</schDate><schFrmTime>10:00:00</schFrmTime><schToTime>12:00:00</schToTime></item>
<item><schDate>2010-06-26</schDate><schFrmTime>13:00:00</schFrmTime><schToTime>14:00:00</schToTime></item>
<item><schDate>2010-06-26</schDate><schFrmTime>10:00:00</schFrmTime><schToTime>12:00:00</schToTime></item>

Update: In case <item> has other children, then this transformation:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:key name="kItemBy3Children" match="item"
     use="concat(schDate, '+', schFrmTime, '+', schToTime)"/>

 <xsl:template match="/">
       <xsl:copy-of select=
        "*/item[generate-id()
              = generate-id(key('kItemBy3Children',
                                concat(schDate,
                                       '+', schFrmTime,
                                       '+', schToTime)
                               )
                            )
               ]
        "/>
 </xsl:template>
</xsl:stylesheet>

produces the wanted result.

like image 172
Dimitre Novatchev Avatar answered Sep 18 '22 08:09

Dimitre Novatchev


The technique I've seen is to do this in two passes: sort the items by all three key fields, and then compare each item to its preceding item (instead of all preceding items).

Is it practical for you to run two separate transformations? It makes the problem much easier.

I saw the technique in an older edition of Michael Kay's XSLT book. You might find it in some of his sample code there.

like image 23
Don Kirkby Avatar answered Sep 20 '22 08:09

Don Kirkby