Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading XML with unclosed tags in C#

Tags:

c#

xml-parsing

I have a program which runs tests and generates a grid-view with all the results in it, and also an XML log file. The program also has the functionality to load logs to replicate the grid-view.

Since the program writes to the log file as its executing, if it crashes the log file will be missing closing tags. I still want to be able to load these XML files though as there is still lots of valuable data that can help me find out what caused the crash.

I was thinking maybe going through the XML file and closing off any unclosed XML tag, or maybe writing some kind of "Dirty" XML reader that would pretend every tag was closed. Any ideas on what I could do or how I should proceed?

Edit:

<Root>
  <Parent>
     <Child Name="One">
        <Foo>...</Foo>
        <Bar>...</Bar>
        <Baz>...</Baz>
     </Child>
     <Child Name="Two">
        <Foo>...</Foo>
        <Bar>...</Bar>
 !-- Crash happens here --!

From this I would still look to produce

 Child   Foo   Bar   Baz
 One     ...   ...   ...
 Two     ...   ...    /
like image 201
Benjamin Avatar asked Mar 14 '12 14:03

Benjamin


1 Answers

Presumably it's all valid until it's truncated... so using XmlReader could work... just be prepared to handle it going bang when it reaches the truncation point.

Now the XmlReader API isn't terribly pleasant (IMO) so you might want to move to the start of some interesting data (which would have to be complete in itself) and then call the XNode.ReadFrom(XmlReader) method to get that data in a simple-to-use form. Then move to the start of the next element and do the same, etc.

Sample code:

using System;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        using (XmlReader reader = XmlReader.Create("test.xml"))
        {
            while (true)
            {
                while (reader.NodeType != XmlNodeType.Element ||
                    reader.LocalName != "Child")
                {
                    if (!reader.Read())
                    {
                        Console.WriteLine("Finished!");
                    }
                }
                XElement element = (XElement) XNode.ReadFrom(reader);
                Console.WriteLine("Got child: {0}", element.Value);
            }
        }
    }
}

Sample XML:

<Root>
  <Parent>
    <Child>First child</Child>
    <Child>Second child</Child>
    <Child>Broken

Sample output:

Got child: First child Got child: Second child

Unhandled Exception: System.Xml.XmlException: Unexpected end of file has occurred
The following elements are not closed: Child, Parent, Root. Line 5, position 18.
   at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XElement.ReadElementFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XNode.ReadFrom(XmlReader reader)
   at Program.Main(String[] args)

So obviously you'd want to catch the exception, but you can see that it managed to read the first two elements correctly.

like image 82
Jon Skeet Avatar answered Oct 05 '22 22:10

Jon Skeet