I'm optimizing a custom object -> XML serialization utility, and it's all done and working and that's not the issue. It worked by loading a file into an <code>XmlDocument</code> object, then recursively going through all the child nodes. I figured that perhaps using <code>XmlReader</code> instead of having <code>XmlDocument</code> loading/parsing the entire thing would be faster, so I implemented that version as well. The algorithms are exactly the same, I use a wrapper class to abstract the functionality of dealing with an <code>XmlNode</code> vs. an <code>XmlReader</code>. For instance, the <code>GetChildren</code> methods yield returns either a child <code>XmlNode</code> or a SubTree <code>XmlReader</code>. So I wrote a test driver to test both versions, and using a non-trivial data set (a 900kb XML file with around 1,350 elements). However, using JetBrains dotTRACE, I see that the <code>XmlReader</code> version is actually slower than the <code>XmlDocument</code> version! It seems that there is some significant processing involved in <code>XmlReader</code> read calls when I'm iterating over child nodes. So I say all that to ask this: What are the advantages/disadvantages of <code>XmlDocument</code> and <code>XmlReader</code>, and in what circumstances should you use either? My guess is that there is a file size threshold at which <code>XmlReader</code> becomes more economical in performance, as well as less memory-intensive. However, that threshold seems to be above 1MB. I'm calling <code>ReadSubTree</code> every time to process child nodes: <pre class="prettyprint"><code>public override IEnumerable<IXmlSourceProvider> GetChildren () { XmlReader xr = myXmlSource.ReadSubtree (); // skip past the current element xr.Read (); while (xr.Read ()) { if (xr.NodeType != XmlNodeType.Element) continue; yield return new XmlReaderXmlSourceProvider (xr); } } </code></pre> That test applies to a lot of objects at a single level (i.e. wide & shallow) - but I wonder how well <code>XmlReader</code> fares when the XML is deep & wide? I.e. the XML I'm dealing with is much like a data object model, 1 parent object to many child objects, etc: <code>1..M..M..M</code> I also don't know beforehand the structure of the XML I'm parsing, so I can't optimize for it.

I've generally looked at it not from a fastest perspective, but rather from a memory utilization perspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration). However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief. XML tends to bloat when loaded into memory, at least with a DOM reader like <code>XmlDocument</code> or <code>XPathDocument</code>. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example. A process using any reader that loads the whole document into memory in its entirety (<code>XmlDocument</code>/<code>XPathDocument</code>) can suffer from large object heap fragmentation, which can ultimately lead to <code>OutOfMemoryException</code>s (even with available memory) resulting in an unavailable service/process. <blockquote> Since objects that are greater than 85K in size end up on the large object heap, and you've got a 10:1 size explosion with a DOM reader, you can see it doesn't take much before your XML documents are being allocated from the large object heap. </blockquote> <code>XmlDocument</code> is very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use. <code>XmlReader</code> is a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use. <code>XPathDocument</code> tends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.

Deciding on when to use XmlDocument vs XmlReader

Tags:

c#

xmldocument

xml-serialization

xmlreader

I'm optimizing a custom object -> XML serialization utility, and it's all done and working and that's not the issue.

It worked by loading a file into an XmlDocument object, then recursively going through all the child nodes.

I figured that perhaps using XmlReader instead of having XmlDocument loading/parsing the entire thing would be faster, so I implemented that version as well.

The algorithms are exactly the same, I use a wrapper class to abstract the functionality of dealing with an XmlNode vs. an XmlReader. For instance, the GetChildren methods yield returns either a child XmlNode or a SubTree XmlReader.

So I wrote a test driver to test both versions, and using a non-trivial data set (a 900kb XML file with around 1,350 elements).

However, using JetBrains dotTRACE, I see that the XmlReader version is actually slower than the XmlDocument version! It seems that there is some significant processing involved in XmlReader read calls when I'm iterating over child nodes.

So I say all that to ask this:

What are the advantages/disadvantages of XmlDocument and XmlReader, and in what circumstances should you use either?

My guess is that there is a file size threshold at which XmlReader becomes more economical in performance, as well as less memory-intensive. However, that threshold seems to be above 1MB.

I'm calling ReadSubTree every time to process child nodes:

public override IEnumerable<IXmlSourceProvider> GetChildren () {     XmlReader xr = myXmlSource.ReadSubtree ();     // skip past the current element     xr.Read ();      while (xr.Read ())     {         if (xr.NodeType != XmlNodeType.Element) continue;         yield return new XmlReaderXmlSourceProvider (xr);     } }

That test applies to a lot of objects at a single level (i.e. wide & shallow) - but I wonder how well XmlReader fares when the XML is deep & wide? I.e. the XML I'm dealing with is much like a data object model, 1 parent object to many child objects, etc: 1..M..M..M

I also don't know beforehand the structure of the XML I'm parsing, so I can't optimize for it.

598

asked Oct 01 '09 16:10

PhilChuang

2 Answers

I've generally looked at it not from a fastest perspective, but rather from a memory utilization perspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration).

However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief.

XML tends to bloat when loaded into memory, at least with a DOM reader like XmlDocument or XPathDocument. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example.

A process using any reader that loads the whole document into memory in its entirety (XmlDocument/XPathDocument) can suffer from large object heap fragmentation, which can ultimately lead to OutOfMemoryExceptions (even with available memory) resulting in an unavailable service/process.

Since objects that are greater than 85K in size end up on the large object heap, and you've got a 10:1 size explosion with a DOM reader, you can see it doesn't take much before your XML documents are being allocated from the large object heap.

XmlDocument is very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use.

XmlReader is a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use.

XPathDocument tends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.

answered Sep 20 '22 13:09

Zach Bonham

XmlDocument is an in-memory representation of the entire XML document. Therefore if your document is large, then it will consume much more memory than if you had read it using XmlReader.

This is assuming that when you use XmlReader you read and process the elements one-by-one then discard it. If you use XmlReader and construct another intermediary structure in memory then you have the same problem, and you're defeating the purpose of it.

Google for "SAX versus DOM" to read more about the difference between the two models of processing XML.

answered Sep 23 '22 13:09

DSO

Related questions
                            
                                DbContext discard changes without disposing
                            
                                EF migration shows empty Up() Down() methods
                            
                                MongoDB C# Driver - Ignore fields on binding
                            
                                How to collapse If, Else, For, Foreach, etc clauses? [duplicate]
                            
                                How to pass data (and references) between scenes in Unity
                            
                                Get a Windows Forms control by name in C#
                            
                                How to include external font in WPF application without installing it
                            
                                How can I retrieve Basic Authentication credentials from the header?
                            
                                Date formatting in WPF datagrid
                            
                                Is shifting bits faster than multiplying and dividing in Java? .NET? [closed]
                            
                                In Xamarin.Forms Device.BeginInvokeOnMainThread() doesn’t show message box from notification callback *only* in Release config on physical device
                            
                                Change Attribute's parameter at runtime
                            
                                Why can't you use null as a key for a Dictionary<bool?, string>?
                            
                                What are Expression Trees, how do you use them, and why would you use them?
                            
                                C#6.0 string interpolation localization
                            
                                Why use EventArgs.Empty instead of null?
                            
                                How would one apply command query separation (CQS), when result data is needed from a command?
                            
                                Is it possible to access backing fields behind auto-implemented properties?
                            
                                Are there any Fuzzy Search or String Similarity Functions libraries written for C#? [closed]
                            
                                Drawing SVG in .NET/C#? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With