Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build XmlNodes from XmlReader

I am parsing a big number of big files and after profiling my bottleneck is:

XmlDocument doc = new XmlDocument();
doc.Load(filename);

This approach was very handy because I could extract nodes like this:

XmlNodeList nodeList = doc.SelectNodes("myXPath");

I am switching to XmlReader, but When I find the element I need to extract I am stuck with regards to how to build a XmlNode from it as not too familiar with XmlReader:

XmlReader xmlReader = XmlReader.Create(fileName);

while (xmlReader.Read())
{
   //keep reading until we see my element
   if (xmlReader.Name.Equals("myElementName") && (xmlReader.NodeType == XmlNodeType.Element))
   {
       // How do I get the Xml element from the reader here?
   }
}

I'd like to be able to build a List<XmlNode> object. I am on .NET 2.0.

Any help appreciated!

like image 677
JohnIdol Avatar asked Oct 14 '09 13:10

JohnIdol


3 Answers

Why not just do the following?

XmlDocument doc = new XmlDocument();
XmlNode node = doc.ReadNode(reader);
like image 122
executor Avatar answered Nov 19 '22 01:11

executor


The XmlNode type does not have a public constructor, so you cannot create them on your own. You will need to have an XmlDocument that you can use to create them:

XmlDocument doc = new XmlDocument();
while (xmlReader.Read())
{
    //keep reading until we see my element
    if (xmlReader.Name.Equals("myElementName") && (xmlReader.NodeType == XmlNodeType.Element))
    {
        // How do I get the Xml element from the reader here?
        XmlNode myNode = doc.CreateNode(XmlNodeType.Element, xmlReader.Name, "");
        nodeList.Add(myNode);
    }        
}
like image 27
Fredrik Mörk Avatar answered Nov 19 '22 02:11

Fredrik Mörk


XmlReader and XmlDocument have a very distinct way of processing. XmlReader keeps nothing in memory and uses a forward-only approach as opposed to building a full DOM tree in memory for XmlDocument. It is helpful when performance is an issue, but it also requires you to write your application differently: instead of using XmlNode, you don't keep anything and only process "on the go": i.e., when an element passes by that you need, you do something. This is close to the SAX approach, but without the callback model.

The answer to "how to get the XmlElement" is: you'll have to build them from scratch based on the info from the reader. This, unfortunately, defies the performance gain. It is often better to prevent using DOM approaches altogether once you switch to XmlReader, unless for a few distinct cases.

Also, the "very handy" way to extract nodes using XPath (SelectNodes is what you show above) cannot be used here: XPath requires a DOM tree. Consider this approach a filtering approach: you can add filters to the XmlReader and tell it to skip certain nodes or read until a certain node. This is extremely fast, but a different way of thinking.

like image 6
Abel Avatar answered Nov 19 '22 01:11

Abel