Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XmlException while deserializing xml file in UTF-16 encoding format

Using C#'s XmlSerializer.

In process of deserializing all xml files in a given folder, I see XmlException "There is an error in XML document (0, 0)". and InnerException is "There is no Unicode byte order mark. Cannot switch to Unicode".

All the xmls in the directory are "UTF-16" encoded. Only difference being, some xml files have elements missing that are defined in the class whose object I am using while deserialization.

For example, consider I have 3 different types of xmls in my folder:

file1.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
</ns0:PaymentStatus>

file2.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
</ns0:PaymentStatus>

file3.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
<PaymentStatus2 RowNum="2" FeedID="39" Amt="26.0000" />
</ns0:PaymentStatus>

I have a class to represent the above xml:

[XmlTypeAttribute(AnonymousType = true, Namespace = "http://my.PaymentStatus")]
[XmlRootAttribute("PaymentStatus", Namespace = "http://http://my.PaymentStatus", IsNullable = true)]
public class PaymentStatus
{

    private PaymentStatus2[] PaymentStatus2Field;

    [XmlElementAttribute("PaymentStatus2", Namespace = "")]
    public PaymentStatus2[] PaymentStatus2 { get; set; }

    public PaymentStatus()
    {
        PaymentStatus2Field = null;
    }
}

[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = true)]

public class PaymentStatus2
{

    private byte rowNumField;
    private byte feedIDField;
    private decimal AmtField;
    public PaymentStatus2()
    {
        rowNumField = 0;
        feedIDField = 0;
        AmtField = 0.0M;
    }

    [XmlAttributeAttribute()]
    public byte RowNum { get; set; }

    [XmlAttributeAttribute()]
    public byte FeedID { get; set; }
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public decimal Amt { get; set; }
}

Following snippet does the deserializing for me:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(fs));
}

Am I missing something? It has to be something with encoding format because when I try to manually replace UTF-16 by UTF-8 and that seems to work just fine.

like image 220
keeda Avatar asked Aug 14 '14 00:08

keeda


People also ask

What is UTF 16 in XML?

UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character.

Does XML support UTF 16?

If you type an XML document into Notepad, you can choose from one of several supported character encodings including ANSI, UTF-8, or UTF-16.

What is Deserializing XML?

Deserialization is the process of reading an instance of an XML document and constructing an object that is strongly typed to the XML Schema (XSD) of the document. Before deserializing, an XmlSerializer must be constructed using the type of the object that is being deserialized.

Which class is used for serializing and deserializing XML data?

XmlSerializer Class (System. Xml. Serialization)


2 Answers

I ran into this same error today working with a third party web service.

I followed Alexei's advice by using a StreamReader and setting the encoding. After that the StreamReader can be used in the XmlTextReader constructor. Here's an implementation of this using the code from the original question:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  StreamReader stream = new StreamReader(fs, Encoding.UTF8);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(stream));
}
like image 129
John Oberreuter Avatar answered Sep 20 '22 11:09

John Oberreuter


Most likely encoding="utf-16" is unrelated to encoding the XMLs are stored and thus causing parser to fail reading stream as UTF-16 text.

Since you have comment that changing to "encoding" parameter to "utf-8" let you read the text I assume files are actually UTF8. You can easily verify that by opening files as binary instead of text in your editor of choice (i.e. Visual Studio).

Most likely reason to get such mismatch is to save XML as writer.Write(document.OuterXml) (get string representation first which puts "utf-16", but than write string to stream with utf-8 encoding by default).

Possible workaround - to read XML in a way that symmetrical to write code - read as string and than load XML from string.

Proper fix - make sure XML is stored correctly.

like image 25
Alexei Levenkov Avatar answered Sep 20 '22 11:09

Alexei Levenkov