Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine large XML files using MSXML SAX in Delphi

Edit: My (incomplete and very rough) XmlLite header translation is available on GitHub

What is the best way to do a simple combine of massive XML documents in Delphi with MSXML without using DOM? Should I use the COM components SAXReader and XMLWriter and are there any good examples?

The transformation is a simple combination of all the Contents elements from the root (Container) from many big files (60MB+) to one huge file (~1GB).

<Container>
    <Contents />
    <Contents />
    <Contents />
</Container>

I have it working in the following C# code using an XmlWriter and XmlReaders, but it needs to happen in a native Delphi process:

var files = new string[] { @"c:\bigFile1.xml", @"c:\bigFile2.xml", @"c:\bigFile3.xml", @"c:\bigFile4.xml", @"c:\bigFile5.xml", @"c:\bigFile6.xml" };

using (var writer = XmlWriter.Create(@"c:\HugeOutput.xml", new XmlWriterSettings{ Indent = true }))
{
    writer.WriteStartElement("Container");

    foreach (var inputFile in files)
        using (var reader = XmlReader.Create(inputFile))
        {
            reader.MoveToContent();
            while (reader.Read())
                if (reader.IsStartElement("Contents"))
                    writer.WriteNode(reader, true);
        }

    writer.WriteEndElement(); //End the Container element
}

We already use MSXML DOM in other parts of the system and I do not want to add new components if possible.

like image 418
carlmon Avatar asked Aug 04 '11 14:08

carlmon


3 Answers

XmlLite is a native C++ port of xml reader and writer from System.Xml, which provides the pull parsing programming model. It is in-the-box with W2K3 SP2, WinXP SP3 and above. You'll need a Delphi header translation before almost 1-1 mapping from C# to Delphi.

like image 72
Samuel Zhang Avatar answered Nov 20 '22 04:11

Samuel Zhang


I'd just use regular file I/O to writeln a to a text file, writeln each of the contents as a string, and finally writeln . If you had a more reasonable size, I'd assemble everything in a stringlist and then stream that to disk. But if you're into GB territory, that would be risky.

like image 33
Chris Thornton Avatar answered Nov 20 '22 03:11

Chris Thornton


libxml with the Delphi wrapper Libxml2 might be an option (found here), it has some SAX support and seems to be very solid - the web page mentions that libxml2 passed all 1800+ tests from the OASIS XML Tests Suite. See also: Is there a SAX Parser for Delphi and Free Pascal?

like image 1
mjn Avatar answered Nov 20 '22 03:11

mjn