Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge and sort multiple XML Files with XSL

The problem is to merge and sort multiple XML files with XSL and output valid HTML, viewable with Firefox >=3.5 and if possible IE >=7. The answer should be as simple as possible (performance is not important).

File a.xml

<?xml version="1.0"?>
<root>
    <tag>cc</tag>
    <tag>aa</tag>
</root>

File b.xml

<?xml version="1.0"?>
<root>
    <tag>xx</tag>
    <tag>bb</tag>
</root>

File index.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="merge.xslt"?>
<list>
    <entry>a.xml</entry>
    <entry>b.xml</entry>
</list>

File merge.xslt

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ph="http://ananas.org/2003/tips/photo">

    <xsl:output method="html"/>

    <xsl:template match="list">
        <html>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="entry">
        <xsl:for-each select="document(.)/root/tag">
            <!-- This will only sort the values of a single file -->
            <xsl:sort select="." data-type="text" order="ascending" />
            - <xsl:value-of select="."/> <br/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Current output:

  • aa

  • cc

  • bb

  • xx

Expected output:

  • aa

  • bb

  • cc

  • xx

like image 882
gaddomn Avatar asked Sep 05 '11 21:09

gaddomn


1 Answers

The solution to this is a very short and easy transformation (absolutely no extension functions are required!):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
  <html>
   <ul>
    <xsl:apply-templates
       select="document(entry)/*/tag">
      <xsl:sort/>
    </xsl:apply-templates>
   </ul>
  </html>
 </xsl:template>

 <xsl:template match="tag">
  <li><xsl:value-of select="."/></li>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided index.xml file:

<list>
    <entry>a.xml</entry>
    <entry>b.xml</entry>
</list>

the wanted, correct result is produced:

<html>
   <ul>
      <li>aa</li>
      <li>bb</li>
      <li>cc</li>
      <li>xx</li>
   </ul>
</html>

and it is displayed in any browser as:

  • aa
  • bb
  • cc
  • xx

Explanation: This solution uses the power of the standard XSLT function document(). As defined in the W3C XSLT 1.0 Recommendation:

When the document function has exactly one argument and the argument is a node-set, then the result is the union, for each node in the argument node-set, of the result of calling the document function with the first argument being the string-value of the node

This explains the effect of this fragment from our code:

<xsl:apply-templates
   select="document(entry)/*/tag">
  <xsl:sort/>
</xsl:apply-templates>

What happens here is that the argument to the document() function is the node-set of all entry children of the top element of index.xml. The result is the union of all document nodes.

Therefore:

select="document(entry)/*/tag"

selects all tag elements in all documents referenced in index.xml. Then they are sorted (by xsl:sort) and each of the element of the already sorted nodelist is processed by the template matching tag.

like image 156
Dimitre Novatchev Avatar answered Nov 19 '22 13:11

Dimitre Novatchev