Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance: XmlReader or LINQ to XML

I have a 150 MB XML file which is used as DB in my project. Currently I'm using XmlReader to read content from it. I want to know if it is better to use XmlReader or LINQ to XML for this scenario.

Note that I'm searching for an item in this XML and display search result, so it can take a long time or just a moment.

like image 522
Nasser Hadjloo Avatar asked Apr 29 '10 07:04

Nasser Hadjloo


People also ask

Should I use XDocument or XmlDocument?

XDocument is from the LINQ to XML API, and XmlDocument is the standard DOM-style API for XML. If you know DOM well, and don't want to learn LINQ to XML, go with XmlDocument . If you're new to both, check out this page that compares the two, and pick which one you like the looks of better.

What is System XML LINQ?

LINQ to XML is an in-memory XML programming interface that enables you to modify XML documents efficiently and easily.

Can we use LINQ for XML?

LINQ to XML is an XML programming interface. LINQ to XML is a LINQ-enabled, in-memory XML programming interface that enables you to work with XML from within the . NET programming languages. LINQ to XML is like the Document Object Model (DOM) in that it brings the XML document into memory.


3 Answers

If you want performance use XMLReader. It doesn't read the whole file and build the DOM tree in memory. It instead, reads the file from disk and gives you back each node it finds on the way.

With a quick google search I found a performance comparison of XMLReader, LinqToXML and XDocument.Load.

https://web.archive.org/web/20130517114458/http://www.nearinfinity.com/blogs/joe_ferner/performance_linq_to_sql_vs.html

like image 145
Tebari Avatar answered Nov 16 '22 02:11

Tebari


I would personally look at using Linq to Xml utilizing the streaming techniques outlined in the Microsoft help file: http://msdn.microsoft.com/en-us/library/system.xml.linq.xstreamingelement.aspx#Y1392

Here's a quick benchmark test reading from a 200mb xml file with a simple filter:

var xmlFilename = "test.xml";

//create test xml file
var initMemoryUsage = GC.GetTotalMemory(true);
var timer = System.Diagnostics.Stopwatch.StartNew();
var rand = new Random();
var testDoc = new XStreamingElement("root", //in order to stream xml output XStreamingElement needs to be used for all parent elements of collection so no XDocument
    Enumerable.Range(1, 10000000).Select(idx => new XElement("child", new XAttribute("id", rand.Next(0, 1000))))
);
testDoc.Save(xmlFilename);
var outStat = String.Format("{0:f2} sec {1:n0} kb //linq to xml ouput streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);

//linq to xml not streamed
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col1 = XDocument.Load(xmlFilename).Root.Elements("child").Where(e => (int)e.Attribute("id") < 10).Select(e => (int)e.Attribute("id")).ToArray();
var stat1 = String.Format("{0:f2} sec {1:n0} kb //linq to xml input not streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);

//xmlreader
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col2 = new List<int>();
using (var reader = new XmlTextReader(xmlFilename))
{
    while (reader.ReadToFollowing("child"))
    {
        reader.MoveToAttribute("id");
        int value = Convert.ToInt32(reader.Value);
        if (value < 10)
            res2.Add(value);
    }
}
var stat2 = String.Format("{0:f2} sec {1:n0} kb //xmlreader", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);

//linq to xml streamed
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col3 = StreamElements(xmlFilename, "child").Where(e => (int)e.Attribute("id") < 10).Select(e => (int)e.Attribute("id")).ToArray();
var stat3 = String.Format("{0:f2} sec {1:n0} kb //linq to xml input streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);

//util method
public static IEnumerable<XElement> StreamElements(string filename, string elementName)
{
    using (var reader = XmlTextReader.Create(filename))
    {
        while (reader.Name == elementName || reader.ReadToFollowing(elementName))
            yield return (XElement)XElement.ReadFrom(reader);
    }
}

And here's the processing time and memory usage on my machine:

11.49 sec 225 kb      // linq to xml ouput streamed

17.36 sec 782,312 kb  // linq to xml input not streamed
6.52 sec 1,825 kb     // xmlreader
11.74 sec 2,238 kb    // linq to xml input streamed
like image 26
Michael Avatar answered Nov 16 '22 01:11

Michael


Write a few benchmark tests to establish exactly what the situation is for you, and take it from there... Linq2XML introduces a lot of flexibility...

like image 26
Martin Milan Avatar answered Nov 16 '22 02:11

Martin Milan