Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting XML with XSLT - entire XML-schema is not known

Tags:

sorting

xml

xslt

I am wondering whether XSLT makes it possible to sort an XML file if I don't know the entire XML-schema.

For example I would like to sort the following XML file.
Sort /CATALOG/CD elements by /CATALOG/CD/TITLE

<CATALOG attrib1="value1">
  <DVD2>
    <TITLE>The Godfather2</TITLE>
  </DVD2>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD attrib4="value4">
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>
      <CATALOG>
        <CD><TITLE>E</TITLE></CD>
        <CD><TITLE>I</TITLE></CD>
        <CD><TITLE>D</TITLE></CD>
      </CATALOG>
    </PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD attrib2="value2">
    <TITLE attrib3="value3">Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <DVD>
    <TITLE>The Godfather1</TITLE>
  </DVD>
</CATALOG>

The output should be:

<CATALOG attrib1="value1">
  <CD attrib4="value4">
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>
      <CATALOG>
        <CD><TITLE>E</TITLE></CD>
        <CD><TITLE>I</TITLE></CD>
        <CD><TITLE>D</TITLE></CD>
      </CATALOG>
    </PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD attrib2="value2">
    <TITLE attrib3="value3">Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <DVD2>
    <TITLE>The Godfather2</TITLE>
  </DVD2>
  <DVD>
    <TITLE>The Godfather1</TITLE>
  </DVD>
</CATALOG>

The following is one of the many tries I did:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <!--<CATALOG>-->
    <xsl:for-each select="CATALOG/CD">
      <xsl:sort select="TITLE" />
      <xsl:copy-of select="."/>
    </xsl:for-each>
    <!--</CATALOG>-->
  </xsl:template>
</xsl:stylesheet>

The problem is that, with this XSLT, XML parts outside the CD list are not displayed.
I could uncomment the two commented-out parts of code, but that's exactly what I want to avoid.
In that case if any attributes are added to the CATALOG element, they would not be copied to output XML.
I don't want to re-build the XML file: I just want to do a sort knowing exact information only about some part of the XML-schema.

This functionality is easy to implement for example using .NET (with XmlDocument and XmlNode objects), or Python's lxmx library, but is it possible with XSLT?

Thanks!

Note: It is not easy to find a sample input XML which will avoid misunderstanding the question in all cases. But I will try to detail the problem as much as I can:

  • only CD elements right under CATALOG should be sorted (for example CD elements under the Bob Dylan section should be left untouched)
  • it is all the same whether elements other than CD (for example DVD and DVD2) are in the beginning or end of the list
  • no elements, attributes, values, comments, so nothing should be missing from the output XML
  • non-CD elements (for example DVD and DVD2) should not be sorted by the TITLE subelement
like image 294
user834929 Avatar asked Jul 08 '11 08:07

user834929


3 Answers

Is this a job for the identity transform? This can be used to copy XML whose schema is not known

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

I think all you need is to add a new template to match the CATALOG element, and then you can take some overriding action in this (in your case, to sort the CD elements)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="CATALOG">
      <xsl:copy>
         <xsl:apply-templates select="@*" />
         <xsl:apply-templates select="CD">
            <xsl:sort select="TITLE"/>
         </xsl:apply-templates>
         <xsl:apply-templates select="*[local-name() != 'CD']" />
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

So, when matching CATALOG, you can still copy any attributes, and any non-CD children in the schema, without explicitly knowing their names. Note that if there are DVD elements until CATALOG for example, these will all be moved after the sorted CD elements in this case.

like image 83
Tim C Avatar answered Nov 03 '22 22:11

Tim C


Keeping on the line of just modifying the identity transformation (which might not be really safe), I think that the following should be equivalent to @Tim's answer.

NOTE I'm not promoting this technique at all, unless you understand what's the general behavior of the identity transformation.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* 
                | node()[not(self::CD[parent::CATALOG])]"/>
            <xsl:apply-templates select="CD[parent::CATALOG]">
                <xsl:sort select="TITLE"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

or, if you care about the other elements DVD and DVD2, you can do:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="CD[parent::CATALOG]">
                <xsl:sort select="TITLE"/>
            </xsl:apply-templates>
            <xsl:apply-templates select="node()
                [not(self::CD[parent::CATALOG])]"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>
like image 22
Emiliano Poggi Avatar answered Nov 03 '22 23:11

Emiliano Poggi


To get all attributes in CATALOG element you can write:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="CATALOG">
        <xsl:copy>
            <xsl:copy-of select="@*"/>

            <xsl:copy-of select="CD[1]/preceding-sibling::*"/>
            <xsl:for-each select="CD">
                <xsl:sort select="TITLE"/>
                <xsl:copy-of select="."/>
            </xsl:for-each>
            <xsl:copy-of select="CD[last()]/following-sibling::*"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Result:

<CATALOG atr1="value1" atr2="value2">
    <DVD>
        <FORMAT>DVD-9</FORMAT>
    </DVD>
    <CD>
        <TITLE>1999 Grammy Nominees</TITLE>
        <ARTIST>Many</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Grammy</COMPANY>
        <PRICE>10.20</PRICE>
        <YEAR>1999</YEAR>
    </CD>
    <CD>
        <TITLE>Big Willie style</TITLE>
        <ARTIST>Will Smith</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Columbia</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1997</YEAR>
    </CD>
    ...
    <BLUERAY>
        <TITLE>Contact</TITLE>
        <YEAR>1997</YEAR>
    </BLUERAY>
</CATALOG>
like image 1
Grzegorz Szpetkowski Avatar answered Nov 03 '22 22:11

Grzegorz Szpetkowski