Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XmlReader behaves different with line breaks

If the data is on a single line the index=int.Parse(logDataReader.ReadElementContentAsString()); and value=double.Parse(logDataReader.ReadElementContentAsString(), cause the cursor to move forward. If I take those calls out I see it loop 6 times in debug.

In the following only 3 <data> are read (and they are wrong as the value is for the next index) on the first (<logData id="Bravo">). On the second (<logData id="Bravo">) all <data> are read.

It is not an option to edit the xml and put in line breaks as that file is created dynamically (by XMLwriter). The NewLineChars setting is a line feed. From XMLwriter it is actually just one line - I broke it down to figure out where it was breaking. In the browser it is displayed properly.

How to fix this?

Here is my XML:

<?xml version="1.0" encoding="utf-8"?>
<log>
   <logData id="Alpha">
      <data><index>100</index><value>150</value></data>
      <data><index>110</index><value>750</value></data>
      <data><index>120</index><value>750</value></data>
      <data><index>130</index><value>150</value></data>
      <data><index>140</index><value>0</value></data>
      <data><index>150</index><value>222</value></data>
   </logData>
   <logData id="Bravo">
      <data>
         <index>100</index>
         <value>25</value>
      </data>
      <data>
         <index>110</index>
         <value>11</value>
      </data>
      <data>
         <index>120</index>
         <value>1</value>
      </data>
      <data>
         <index>130</index>
         <value>25</value></data>
      <data>
         <index>140</index>
         <value>0</value>
      </data>
      <data>
         <index>150</index>
         <value>1</value>
      </data>
   </logData>
</log>

And my code:

static void Main(string[] args)
{
    List<LogData> logDatas = GetLogDatasFromFile("singleVersusMultLine.xml");
    Debug.WriteLine("Main");
    Debug.WriteLine("logData");
    foreach (LogData logData in logDatas)
    {
        Debug.WriteLine($"    logData.ID {logData.ID}");
        foreach(LogPoint logPoint in logData.LogPoints)
        {
            Debug.WriteLine($"        logData.Index {logPoint.Index}  logData.Value {logPoint.Value}");
        }
    }
    Debug.WriteLine("end");
}       
public static List<LogData> GetLogDatasFromFile(string xmlFile)
{
    List<LogData> logDatas = new List<LogData>();

    using (XmlReader reader = XmlReader.Create(xmlFile))
    {
        // move to next "logData"
        while (reader.ReadToFollowing("logData"))
        {
            var logData = new LogData(reader.GetAttribute("id"));
            using (var logDataReader = reader.ReadSubtree())
            {
                // inside "logData" subtree, move to next "data"
                while (logDataReader.ReadToFollowing("data"))
                {
                    // move to index
                    logDataReader.ReadToFollowing("index");
                    // read index
                    var index = int.Parse(logDataReader.ReadElementContentAsString());
                    // move to value
                    logDataReader.ReadToFollowing("value");
                    // read value
                    var value = double.Parse(logDataReader.ReadElementContentAsString(), CultureInfo.InvariantCulture);
                    logData.LogPoints.Add(new LogPoint(index, value));
                }
            }
            logDatas.Add(logData);
        }
    }
    return logDatas;
}

public class LogData
{
    public string ID { get; }
    public List<LogPoint> LogPoints { get; } = new List<LogPoint>();
    public LogData (string id)
    {
        ID = id;
    }
}
public class LogPoint
{
    public int Index { get; }
    public double Value { get; }
    public LogPoint ( int index, double value)
    {
        Index = index;
        Value = value;
    }
}
like image 804
paparazzo Avatar asked Apr 28 '18 17:04

paparazzo


1 Answers

Your problem is as follows. According to the documentation for XmlReader.ReadElementContentAsString():

This method reads the start tag, the contents of the element, and moves the reader past the end element tag.

And from the documentation for XmlReader.ReadToFollowing(String):

It advances the reader to the next following element that matches the specified name and returns true if a matching element is found.

Thus, after the call to ReadElementContentAsString(), since the reader has been advanced to the next node, it might already be positioned on the next <value> or <data> node. Then when you call ReadToFollowing() this element node is skipped because the method unconditionally moves on to the next node with the correct name. But if the XML is indented then the next node immediately after the call to ReadElementContentAsString() will be an XmlNodeType.Whitespace node, protecting against this bug.

The solution is to check whether the reader is already positioned correctly after the call to ReadElementContentAsString(). First, introduce the following extension method:

public static class XmlReaderExtensions
{
    public static bool ReadToFollowingOrCurrent(this XmlReader reader, string localName, string namespaceURI)
    {
        if (reader == null)
            throw new ArgumentNullException(nameof(reader));
        if (reader.NodeType == XmlNodeType.Element && reader.LocalName == localName && reader.NamespaceURI == namespaceURI)
            return true;
        return reader.ReadToFollowing(localName, namespaceURI);
    }
}

Then modify your code as follows:

public static List<LogData> GetLogDatasFromFile(string xmlFile)
{
    List<LogData> logDatas = new List<LogData>();

    using (XmlReader reader = XmlReader.Create(xmlFile))
    {
        // move to next "logData"
        while (reader.ReadToFollowing("logData", ""))
        {
            var logData = new LogData(reader.GetAttribute("id"));
            using (var logDataReader = reader.ReadSubtree())
            {
                // inside "logData" subtree, move to next "data"
                while (logDataReader.ReadToFollowing("data", ""))
                {
                    // move to index
                    logDataReader.ReadToFollowing("index", "");
                    // read index
                    var index = XmlConvert.ToInt32(logDataReader.ReadElementContentAsString());
                    // move to value
                    logDataReader.ReadToFollowingOrCurrent("value", "");
                    // read value
                    var value = XmlConvert.ToDouble(logDataReader.ReadElementContentAsString());
                    logData.LogPoints.Add(new LogPoint(index, value));
                }
            }
            logDatas.Add(logData);
        }
    }
    return logDatas;
}       

Notes:

  • Always prefer to use XmlReader methods in which the local name and namespace are specified separately, such as XmlReader.ReadToFollowing (String, String). When you use a method such as XmlReader.ReadToFollowing(String) which accepts a single qualified name, you are implicitly hardcoding the choice of XML prefix, which is generally not a good idea. XML parsing should be independent of prefix choice.

  • While you correctly parsed your double using the CultureInfo.InvariantCulture locale, it's even easier to use the methods from the XmlConvert class to handle parsing and formatting correctly.

  • XmlReader.ReadSubtree() leaves the XmlReader positioned on the EndElement node of the element being read, so you shouldn't need to call ReadToFollowingOrCurrent() afterwards. (Nice use of ReadSubtree() to avoid reading too little or too much by the way; by using this method one can avoid several frequent mistakes with XmlReader.)

  • As you have found, code that manually reads XML using XmlReader should always be unit-tested with both formatted and unformatted XML, because certain bugs will only arise with one or the other. (See e.g. this answer, this one and this one also for other examples of such.)

Working sample .Net fiddle here.

like image 73
dbc Avatar answered Sep 23 '22 14:09

dbc