Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append to large XML files in C# using memory efficiently

Is there some way I can combine two XmlDocuments without holding the first in memory?

I have to cycle through a list of up to a hundred large (~300MB) XML files, appending to each up to 1000 nodes, repeating the whole process several times (as the new node list is cleared to save memory). Currently I load the whole XmlDocument into memory before appending new nodes, which is currently not tenable.

What would you say is the best way to go about this? I have a few ideas but I'm not sure which is best:

  1. Never load the whole XMLDocument, instead using XmlReader and XmlWriter simultaneously to write to a temp file which is subsequently renamed.
  2. Make a XmlDocument for the new nodes only, and then manually write it to the existing file (i.e. file.WriteLine( "<node>\n" )
  3. Something else?

Any help will be much appreciated.

Edit Some more details in answer to some of the comments:

The program parses several large logs into XML, grouping into different files by source. It only needs to run once a day, and once the XML is written there is a lightweight proprietary reader program which gives reports on the data. The program only needs to run once a day so can be slow, but runs on a server which performs other actions, mainly file compression and transfer, which cannot be effected too much.

A database would probably be easier, but the company isn't going to do this any time soon!

As is, the program runs on the dev machine using a few GB of memory at the most, but throws out of memory exceptions when run on the sever.

Final Edit The task is quite low-prority, which is why it would only cost extra to get a database (though I will look into mongo).

The file will only be appended to, and won't grow indefinitely - each final file is only for a day's worth of the log, and then new files are generated the following day.

I'll probably use the XmlReader/Writer method since it will be easiest to ensure XML validity, but I have taken all your comments/answers into consideration. I know that having XML files this large is not a particularly good solution, but it's what I'm limited to, so thanks for all the help given.

like image 708
Overlord_Dave Avatar asked Nov 03 '22 18:11

Overlord_Dave


1 Answers

If you wish to be completely certain of the XML structure, using XMLWriter and XMLReader are the best way to go.

However, for absolutely highest possible performance, you may be able to recreate this code quickly using direct string functions. You could do this, although you'd lose the ability to verify the XML structure - if one file had an error you wouldn't be able to correct it:

using (StreamWriter sw = new StreamWriter("out.xml")) {
    foreach (string filename in files) {
        sw.Write(String.Format(@"<inputfile name=""{0}"">", filename));
        using (StreamReader sr = new StreamReader(filename)) {
            // Using .NET 4's CopyTo(); alternatively try http://bit.ly/RiovFX
            if (max_performance) {
                sr.CopyTo(sw);
            } else {
                string line = sr.ReadLine();
                // parse the line and make any modifications you want
                sw.Write(line);
                sw.Write("\n");
            }
        }
        sw.Write("</inputfile>");
    }
}

Depending on the way your input XML files are structured, you might opt to remove the XML headers, maybe the document element, or a few other un-necessary structures. You could do that by parsing the file line by line

like image 167
Ted Spence Avatar answered Nov 09 '22 23:11

Ted Spence