Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split large xml file with xslt 2.0

I have this source xml file.

   <DATA>
    <DATASET>      
      <KE action="create">
         <A>USVa</A>
         <B>USVb</B>
         <C>USV10</C>             
      </KE>
      <KE>
       ....
      </KE>
    </DATASET>
   </DATA>

The element "KE" exists round about 30000 times. I want to create every 5000 "KE" a new XML file. In the case of 30000 KE-elements must be the result 6 separate xml files and the structure a copy of the source xml.

How I can realize this with XSLT 2.0? I'm using saxonhe9-5-1-3j. Many thanks ...

like image 360
user3123034 Avatar asked Mar 21 '23 23:03

user3123034


1 Answers

Use the XSLT 2.0 functionality xsl:for-each-group and the modulus of the position of KE elements. Then, generate output documents with the xsl:result-document element.

My sample XSLT code creates a new result-document for groups of 3 KE elements. Adjust this number to "5000" for your input XML.

Stylesheet

1 Simplified the stylesheet, thanks to @Martin Honnen. 2 Edited again, suggested by @michael.hor257k.

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/DATA">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="DATASET">
  <xsl:for-each-group select="KE" group-starting-with="KE[(position() -1)mod 3 = 0]">
     <xsl:variable name="file" select="concat('ke',position(),'.xml')"/>
     <xsl:result-document href="{$file}">
        <DATA>
           <DATASET>
              <xsl:copy-of select="current-group()"/>
           </DATASET>
        </DATA>
     </xsl:result-document>
  </xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>

You get the following output (I have numbered the KE elements for convenience, the stylesheet does not rely on the n attribute).

Output: ke1.xml

<?xml version="1.0" encoding="UTF-8"?>
<DATA>
 <DATASET>
  <KE n="1" action="create">
     <A>USVa</A>
     <B>USVb</B>
     <C>USV10</C>
  </KE>
  <KE n="2" action="create">
     <A>USVa</A>
     <B>USVb</B>
     <C>USV10</C>
  </KE>
  <KE n="3" action="create">
     <A>USVa</A>
     <B>USVb</B>
     <C>USV10</C>
  </KE>
 </DATASET>
</DATA>

Output: ke2.xml

<?xml version="1.0" encoding="UTF-8"?>
<DATA>
 <DATASET>
  <KE n="4" action="create">
     <A>USVa</A>
     <B>USVb</B>
     <C>USV10</C>
  </KE>
  <KE n="5" action="create">
     <A>USVa</A>
     <B>USVb</B>
     <C>USV10</C>
  </KE>
  <KE n="6" action="create">
     <A>USVa</A>
     <B>USVb</B>
     <C>USV10</C>
  </KE>
 </DATASET>
</DATA>

The other output documents look the same.

like image 191
Mathias Müller Avatar answered Apr 02 '23 13:04

Mathias Müller