I have a 150 MB XML file which is used as DB in my project. Currently I'm using XmlReader
to read content from it. I want to know if it is better to use XmlReader
or LINQ to XML for this scenario.
Note that I'm searching for an item in this XML and display search result, so it can take a long time or just a moment.
XDocument is from the LINQ to XML API, and XmlDocument is the standard DOM-style API for XML. If you know DOM well, and don't want to learn LINQ to XML, go with XmlDocument . If you're new to both, check out this page that compares the two, and pick which one you like the looks of better.
LINQ to XML is an in-memory XML programming interface that enables you to modify XML documents efficiently and easily.
LINQ to XML is an XML programming interface. LINQ to XML is a LINQ-enabled, in-memory XML programming interface that enables you to work with XML from within the . NET programming languages. LINQ to XML is like the Document Object Model (DOM) in that it brings the XML document into memory.
If you want performance use XMLReader. It doesn't read the whole file and build the DOM tree in memory. It instead, reads the file from disk and gives you back each node it finds on the way.
With a quick google search I found a performance comparison of XMLReader, LinqToXML and XDocument.Load.
https://web.archive.org/web/20130517114458/http://www.nearinfinity.com/blogs/joe_ferner/performance_linq_to_sql_vs.html
I would personally look at using Linq to Xml utilizing the streaming techniques outlined in the Microsoft help file: http://msdn.microsoft.com/en-us/library/system.xml.linq.xstreamingelement.aspx#Y1392
Here's a quick benchmark test reading from a 200mb xml file with a simple filter:
var xmlFilename = "test.xml";
//create test xml file
var initMemoryUsage = GC.GetTotalMemory(true);
var timer = System.Diagnostics.Stopwatch.StartNew();
var rand = new Random();
var testDoc = new XStreamingElement("root", //in order to stream xml output XStreamingElement needs to be used for all parent elements of collection so no XDocument
Enumerable.Range(1, 10000000).Select(idx => new XElement("child", new XAttribute("id", rand.Next(0, 1000))))
);
testDoc.Save(xmlFilename);
var outStat = String.Format("{0:f2} sec {1:n0} kb //linq to xml ouput streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//linq to xml not streamed
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col1 = XDocument.Load(xmlFilename).Root.Elements("child").Where(e => (int)e.Attribute("id") < 10).Select(e => (int)e.Attribute("id")).ToArray();
var stat1 = String.Format("{0:f2} sec {1:n0} kb //linq to xml input not streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//xmlreader
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col2 = new List<int>();
using (var reader = new XmlTextReader(xmlFilename))
{
while (reader.ReadToFollowing("child"))
{
reader.MoveToAttribute("id");
int value = Convert.ToInt32(reader.Value);
if (value < 10)
res2.Add(value);
}
}
var stat2 = String.Format("{0:f2} sec {1:n0} kb //xmlreader", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//linq to xml streamed
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col3 = StreamElements(xmlFilename, "child").Where(e => (int)e.Attribute("id") < 10).Select(e => (int)e.Attribute("id")).ToArray();
var stat3 = String.Format("{0:f2} sec {1:n0} kb //linq to xml input streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//util method
public static IEnumerable<XElement> StreamElements(string filename, string elementName)
{
using (var reader = XmlTextReader.Create(filename))
{
while (reader.Name == elementName || reader.ReadToFollowing(elementName))
yield return (XElement)XElement.ReadFrom(reader);
}
}
And here's the processing time and memory usage on my machine:
11.49 sec 225 kb // linq to xml ouput streamed
17.36 sec 782,312 kb // linq to xml input not streamed
6.52 sec 1,825 kb // xmlreader
11.74 sec 2,238 kb // linq to xml input streamed
Write a few benchmark tests to establish exactly what the situation is for you, and take it from there... Linq2XML introduces a lot of flexibility...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With