We export “records” to an xml file; one of our customers has complained that the file is too big for their other system to process. Therefore I need to split up the file, while repeating the “header section” in each of the new files.
So I am looking for something that will let me define some xpaths for the section(s) that should always be outputted, and another xpath for the “rows” with a parameter that says how many rows to put in each file and how to name the files.
Before I start writing some custom .net code to do this; is there a standard command line tool that will work on windows that does it?
(As I know how to program in C#, I am more included to write code then try to mess about with complex xsl etc, but a "of the self" solution would be better then custom code.)
First download foxe xml editor from this link http://www.firstobject.com/foxe242.zip
Watch that video http://www.firstobject.com/xml-splitter-script-video.htm Video explains how split code works.
There is a script code on that page (starts with split()
) copy the code and on the xml editor program make a "New Program" under the "File". Paste the code and save it. The code is:
split()
{
CMarkup xmlInput, xmlOutput;
xmlInput.Open( "**50MB.xml**", MDF_READFILE );
int nObjectCount = 0, nFileCount = 0;
while ( xmlInput.FindElem("//**ACT**") )
{
if ( nObjectCount == 0 )
{
++nFileCount;
xmlOutput.Open( "**piece**" + nFileCount + ".xml", MDF_WRITEFILE );
xmlOutput.AddElem( "**root**" );
xmlOutput.IntoElem();
}
xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
++nObjectCount;
if ( nObjectCount == **5** )
{
xmlOutput.Close();
nObjectCount = 0;
}
}
if ( nObjectCount )
xmlOutput.Close();
xmlInput.Close();
return nFileCount;
}
Change the bold marked (or ** ** marked) fields for your needs. (this is also expressed at the video page)
On the xml editor window right click and click the RUN (or simply F9). There is output bar on the window where it shows number of files that generated.
Note:
input File name can be "C:\\Users\\AUser\\Desktop\\a_xml_file.xml"
(double slashes)
and output file "C:\\Users\\AUser\\Desktop\\anoutputfolder\\piece" + nFileCount + ".xml"
There's no general-purpose solution to this, because there's so many different possible ways that your source XML could be structured.
It's reasonably straightforward to build an XSLT transform that will output a slice of an XML document. For instance, given this XML:
<header>
<data rec="1"/>
<data rec="2"/>
<data rec="3"/>
<data rec="4"/>
<data rec="5"/>
<data rec="6"/>
</header>
you can output a copy of the file containing only data
elements within a certain range with this XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:param name="startPosition"/>
<xsl:param name="endPosition"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="header">
<xsl:copy>
<xsl:apply-templates select="data"/>
</xsl:copy>
</xsl:template>
<xsl:template match="data">
<xsl:if test="position() >= $startPosition and position() <= $endPosition">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
(Note, by the way, that because this is based on the identity transform, it works even if header
isn't the top-level element.)
You still need to count the data
elements in the source XML, and run the transform repeatedly with the values of $startPosition
and $endPosition
that are appropriate for the situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With