Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High speed XSLT to convert from XML to CSV

I have been trying to get CSV data from an XML document in most efficient time using XSLT. Following is my sample XML

<?xml version="1.0" encoding="ISO-8859-1"?>
 <sObjects xmlns="urn:sobject.partner.soap.sforce.com">
     <sObject>
        <Name>Raagu</Name>
        <BillingStreet>Hoskote</BillingStreet>
</sObject>
     <sObject>
        <Name>Rajath</Name>
         <BillingStreet>BTM</BillingStreet>
         <age>25</age>
</sObject>
     <sObject>
        <Name>Sarath</Name>
         <BillingStreet>Murgesh</BillingStreet>
         <location>Bangalore</location>
         <age>#N/A</age>
</sObject>
     <sObject>
         <Name>Bharath</Name>
         <BillingStreet>EGL</BillingStreet>
         <location>Bangalore</location>
             <shipping>Hoskote</Shipping>
</sObject>
     <sObject>
         <Id>12312321321</Id>
         <Name>Guru</Name>
         <location>Sirsi</location>
         <date>12-12-12</date>
</sObject>
     <sObject>
         <Name>Appa</Name>
         <BillingStreet>someStrrt</BillingStreet>
         <accountNo>213213</accountNo>
</sObject>
           <sObject>
          <Name>Sarath</Name>
          <BillingStreet>Murgesh</BillingStreet>
         <location>Bangalore</location>
</sObject>
     <sObject>
          <Name>Sarath</Name>
         <BillingStreet>Murgesh</BillingStreet>
          <location>Bangalore</location>
</sObject>
     <sObject>
          <Name>Sarath</Name>
          <BillingStreet>Murgesh</BillingStreet>
           <location>Bangalore</location>
</sObject>

and I wanted Output of this kind

 <?xml version="1.0" encoding="utf-8"?><csv xmlns="http://www.approuter.com/schemas/RootNode"><data>Name,BillingStreet,age,location,Shipping,Id,date,accountNo
Raagu,Hoskote,,,,,,
Rajath,BTM,25,,,,,
Sarath,Murgesh,#N/A,Bangalore,,,,
Bharath,EGL,,Bangalore,Hoskote,,,
Guru,,,Sirsi,,12312321321,12-12-12,
Appa,someStrrt,,,,,,213213
Sarath,Murgesh,,Bangalore,,,,
Sarath,Murgesh,,Bangalore,,,,
Sarath,Murgesh,,Bangalore,,,,</data></csv>

To get this done, I have tried following XSLT

<xsl:stylesheet version="1.0" xmlns:p0="urn:sobject.partner.soap.sforce.com" xmlns:csv="csv:csv" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output encoding="utf-8"  method="xml"/>
    <xsl:strip-space elements="*" />
    <xsl:variable name="delimiter" select="','"/>

    <xsl:key name="field" match="p0:sObject/*" use="name()"/>

<!-- variable containing the first occurrence of each field -->
    <xsl:variable name="allFields"
    select="/*/*/*[generate-id()=generate-id(key('field', name())[1])]"/>

    <xsl:template match="/">
    <!-- Output the CSV header -->
        <xsl:element name="csv" namespace="http://www.approuter.com/schemas/RootNode">
            <xsl:element name="data" namespace="http://www.approuter.com/schemas/RootNode">

                <xsl:for-each select="$allFields">
                    <xsl:value-of select="name()" />
                    <xsl:if test="position() &lt; last()">
                        <xsl:value-of select="$delimiter" />
                    </xsl:if>
                </xsl:for-each>

                <xsl:text>&#xa;    </xsl:text>

                <xsl:apply-templates select="/*/p0:sObject" />

            </xsl:element>
        </xsl:element>

    </xsl:template>

    <xsl:template match="p0:sObject">
        <xsl:variable name="this" select="." />
        <xsl:for-each select="$allFields">
            <xsl:value-of select="$this/*[name() = name(current())]" />
            <xsl:if test="position() &lt; last()">
                <xsl:value-of select="$delimiter" />
            </xsl:if>
        </xsl:for-each>
        <xsl:if test="position() &lt; last()">
            <xsl:text>&#xa;    </xsl:text>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

The above XSLT works really well from functioanality point of view. But I am trying this to process around 10000 records. i,e 10000 instances on sObject elements each sObject will contain around 15 fields under it.

If I run this above XSLT to process these many records, it is going for toss. XSLT is taking around 20 minutes to process and giving csv data. I wanted to get this done within seconds. That is XSLT should take between 3-4 seconds to process 10k records (sObject entries) to give valid CSV data as shown above.

This is where I am stuck to enhance the XSLT and need help in modifying this XSLT to work faster.

like image 717
Raghavendra Nilekani Avatar asked Mar 05 '13 18:03

Raghavendra Nilekani


People also ask

Can XSLT transform XML to CSV?

This post shows you how to convert a simple XML file to CSV using XSLT. The following XSL Style Sheet (compatible with XSLT 1.0) can be used to transform the XML into CSV. It is quite generic and can easily be configured to handle different xml elements by changing the list of fields defined ar the beginning.

Is it possible to convert XML to CSV?

First, you can copy and enter data of the XML file and save the data as a CSV file; secondly, you can upload an XML file to the converter and convert it to CSV without opening; Finally, it allows you to enter the URL of your XML file. You can choose the method that fits you best.

Is there any benefit of converting XML to XSLT?

XSLT is commonly used to convert XML to HTML, but can also be used to transform XML documents that comply with one XML schema into documents that comply with another schema. XSLT can also be used to convert XML data into unrelated formats, like comma-delimited text or formatting languages such as troff.

How convert XML to XSLT?

Execute an XSLT transformation from an XML fileOpen an XML document in the XML editor. Associate an XSLT style sheet with the XML document. Add an xml-stylesheet processing instruction to the XML document. For example, add the following line to the document prolog: <?


1 Answers

I think this is a difficult problem. I didn't see anything obvious. Using a multi-step build is the trick: I created pass1.xsl and pass2.xsl which creates the output tree much faster.

I made a test file that had a tree size of 252097 nodes (697768 characters). Your XSL took 21sec and the below two xsls took a few seconds.

Pass 1 XSL

<xsl:stylesheet version="2.0" xmlns:p0="urn:sobject.partner.soap.sforce.com" 
    xmlns:csv="csv:csv" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output encoding="utf-8"  method="xml" indent="no"/>
    <xsl:strip-space elements="*" />
    <xsl:variable name="delimiter" select="','"/>

    <xsl:variable name="allFields">
        <xsl:for-each select="distinct-values( /*/*/*/name() )">
            <xsl:element name="{.}" />
        </xsl:for-each>    
    </xsl:variable>

    <xsl:template match="p0:sObjects">
        <xsl:element name="{local-name(.)}" namespace="urn:sobject.partner.soap.sforce.com">
            <xsl:element name="order" namespace="urn:sobject.partner.soap.sforce.com">
                <xsl:for-each select="$allFields/*">
                    <xsl:value-of select="name()" />
                    <xsl:if test="position() &lt; last()">
                        <xsl:value-of select="$delimiter" />
                    </xsl:if>
                </xsl:for-each>
            </xsl:element>

            <xsl:apply-templates select="/*/p0:sObject" />
        </xsl:element>
    </xsl:template>

    <xsl:template match="p0:sObject">
        <xsl:variable name="this" select="." />
        <xsl:element name="{local-name(.)}" namespace="urn:sobject.partner.soap.sforce.com">
            <xsl:for-each select="$allFields/*">
                <xsl:element name="{local-name(.)}" namespace="urn:sobject.partner.soap.sforce.com">
                    <xsl:value-of select="$this/*[name() = name(current())]" />
                </xsl:element>
            </xsl:for-each>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

Pass 2 XSL

<xsl:stylesheet version="2.0" xmlns:p0="urn:sobject.partner.soap.sforce.com" 
    xmlns:csv="csv:csv" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output encoding="utf-8"  method="xml" indent="no"/>
    <xsl:strip-space elements="*" />
    <xsl:variable name="delimiter" select="','"/>

    <xsl:template match="/">
        <xsl:element name="csv" namespace="http://www.approuter.com/schemas/RootNode">
            <xsl:element name="data" namespace="http://www.approuter.com/schemas/RootNode">
                <xsl:apply-templates select="/p0:sObjects/*" />
            </xsl:element>
        </xsl:element>
    </xsl:template>

    <xsl:template match="p0:order">
        <xsl:value-of select="." /> 
    </xsl:template>

    <!-- use Michael's suggestion about using first instead of last() -->
    <xsl:template match="p0:sObject/*[ position() = 1 ]">
        <xsl:text>&#xa;    </xsl:text>
        <xsl:value-of select="." />
    </xsl:template>

    <xsl:template match="p0:sObject/*[ position() > 1 ]">
        <xsl:value-of select="$delimiter" />
        <xsl:value-of select="." />
    </xsl:template>
</xsl:stylesheet>

Pass 1 output

This is the output xml from pass 1. This would create an xml that looks like this (which is easier to parse for the next phase. See the new empty elements under sObject?):

<sObjects xmlns="urn:sobject.partner.soap.sforce.com">
   <order>Name,BillingStreet,age,location,shipping,Id,date,accountNo</order>
   <sObject>
      <Name>Raagu</Name>
      <BillingStreet>Hoskote</BillingStreet>
      <age/>
      <location/>
      <shipping/>
      <Id/>
      <date/>
      <accountNo/>
   </sObject>
</xObjects>

Finally, the result:

<csv xmlns="http://www.approuter.com/schemas/RootNode"><data>Name,BillingStreet,age,location,shipping,Id,date,accountNo
    Raagu,Hoskote,,,,,,
    Rajath,BTM,25,,,,,
...

The command I executed:

saxonb-xslt -t bigxml.xml pass1.xsl > intermediate.xml
saxonb-xslt -t intermediate.xml pass2.xsl > res.xml
like image 190
ljdelight Avatar answered Sep 18 '22 17:09

ljdelight