Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML file using C#?

I'm new to both XML and C#; I'm trying to find a way to efficiently parse a given xml file to retrieve relevant numerical values, base on the "proj_title" value=heat_run or any other possible values. For example, calculating the duration of a particular test run (proj_end val-proj_start val).

ex.xml:

<proj ID="2">
      <proj_title>heat_run</proj_title>
      <proj_start>100</proj_start>
      <proj_end>200</proj_end>
</proj>

... We can't search by proj ID since this value is not fixed from test run to test run. The above file is huge: ~8mb, and there's ~2000 tags w/ the name proj_title. is there an efficient way to first find all tag names w/ proj_title="heat_run", then to retrieve the proj start and end value for this particular proj_title using C#??

Here's my current C# code:

public class parser
{
     public static void Main()
     {
         XmlDocument xmlDoc= new XmlDocument();
         xmlDoc.Load("ex.xml");

         //~2000 tags w/ proj_title
         //any more efficient way to just look for proj_title="heat_run" specifically?
         XmlNodeList heat_run_nodes=xmlDoc.GetElementsByTagName("proj_title");
     }
}    
like image 907
jerryh91 Avatar asked Jun 03 '13 16:06

jerryh91


3 Answers

8MB really isn't very large at all by modern standards. Personally I'd use LINQ to XML:

XDocument doc = XDocument.Load("ex.xml");
var projects = doc.Descendants("proj_title")
                  .Where(x => (string) x == "heat_run")
                  .Select(x => x.Parent) // Just for simplicity
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });

foreach (var project in projects)
{
    Console.WriteLine("Start: {0}; End: {1}", project.Start, project.End);
}

(Obviously adjust this to your own requirements - it's not really clear what you need to do based on the question.)

Alternative query:

var projects = doc.Descendants("proj")
                  .Where(x => (string) x.Element("proj_title") == "heat_run")
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });
like image 51
Jon Skeet Avatar answered Nov 10 '22 12:11

Jon Skeet


You can use XPath to find all nodes that match, for example:

XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")

matches will contain all proj nodes that match the critera. Learn more about XPath: http://www.w3schools.com/xsl/xpath_syntax.asp

MSDN Documentation on SelectNodes

like image 21
wgraham Avatar answered Nov 10 '22 10:11

wgraham


Use XDocument and use the LINQ api. http://msdn.microsoft.com/en-us/library/bb387098.aspx

If the performance is not what you expect after trying it, you have to look for a sax parser. A Sax parser will not load the whole document in memory and try to apply an xpath expression on everything in memory. It works more in an event driven approach and in some cases this can be a lot faster and does not use as much memory.

There are probably sax parsers for .NET around there, haven't used them myself for .NET but I did for C++.

like image 3
Philip Stuyck Avatar answered Nov 10 '22 11:11

Philip Stuyck