Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RESTSharp has problems deserializing XML including Byte Order Mark?

There is a public webservice which I want to use in a short C# Application: http://ws.parlament.ch/

The returned XML from this webservice has a "BOM" at the beginning, which causes RESTSharp to fail the deserializing of the XML with the following error message:

Error retrieving response. Check inner details for more info. ---> System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg) at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace() at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options) at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text) at RestSharp.Deserializers.XmlDeserializer.Deserialize[T](IRestResponse response) at RestSharp.RestClient.Deserialize[T](IRestRequest request, IRestResponse raw)
--- End of inner exception stack trace ---

Here is an easy sample by using http://ws.parlament.ch/sessions?format=xml to get a List of 'Sessions':

public class Session
{
    public int Id { get; set; }
    public DateTime? Updated { get; set; }
    public int? Code { get; set; }
    public DateTime? From { get; set; }
    public string Name { get; set; }
    public DateTime? To { get; set; }
}


static void Main(string[] args)
    {
        var request = new RestRequest();
        request.RequestFormat = DataFormat.Xml;
        request.Resource = "sessions";
        request.AddParameter("format", "xml");

        var client = new RestClient("http://ws.parlament.ch/");
        var response = client.Execute<List<Session>>(request);

        if (response.ErrorException != null)
        {
            const string message = "Error retrieving response.  Check inner details for more info.";
            var ex = new ApplicationException(message, response.ErrorException);
            Console.WriteLine(ex);
        }

        List<Session> test = response.Data;

        Console.Read();
    }

When I first manipulate the returned xml with Fiddler to remove the first 3 bits (the "BOM"), the above code works! May someone please help me to handle this directly in RESTSharp? What am I doing wrong? THANK YOU in advance!

like image 294
dataCore Avatar asked Oct 29 '13 15:10

dataCore


3 Answers

I found the Solution - Thank you @arootbeer for the hints!

Instead of wrapping the XMLDeserializer, you can also use the 'RestRequest.OnBeforeDeserialization' event from #RESTSharp. So you just need to insert something like this after the new RestRequest() (see my initial code example) and then it works perfect!

request.OnBeforeDeserialization = resp =>
            {
                //remove the first ByteOrderMark
                //see: http://stackoverflow.com/questions/19663100/restsharp-has-problems-deserializing-xml-including-byte-order-mark
                string byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
                if (resp.Content.StartsWith(byteOrderMarkUtf8))
                    resp.Content = resp.Content.Remove(0, byteOrderMarkUtf8.Length);
            };
like image 172
dataCore Avatar answered Oct 04 '22 18:10

dataCore


I had this same problem, but not specifically with RestSharp. Use this:

var responseXml = new UTF8Encoding(false).GetString(bytes);

Original discussion: XmlReader breaks on UTF-8 BOM

Pertinent quote from the answer:

The xml string must not (!) contain the BOM, the BOM is only allowed in byte data (e.g. streams) which is encoded with UTF-8. This is because the string representation is not encoded, but already a sequence of unicode characters.

Edit: Looking through their docs, it looks like the most straightforward way to handle this (aside from a GitHub issue) is to call the non-generic Execute() method and deserialize the response from that string. You could also create an IDeserializer that wraps the default XML deserializer.

like image 25
Matt Mills Avatar answered Oct 04 '22 19:10

Matt Mills


The solution that @dataCore posted doesn't quite work, but this one should.

request.OnBeforeDeserialization = resp => {
    if (resp.RawBytes.Length >= 3 && resp.RawBytes[0] == 0xEF && resp.RawBytes[1] == 0xBB && resp.RawBytes[2] == 0xBF)
    {
        // Copy the data but with the UTF-8 BOM removed.
        var newData = new byte[resp.RawBytes.Length - 3];
        Buffer.BlockCopy(resp.RawBytes, 3, newData, 0, newData.Length);
        resp.RawBytes = newData;

        // Force re-conversion to string on next access
        resp.Content = null;
    }
};

Setting resp.Content to null is there as a safety guard, as RawBytes is only converted to a string if Content isn't already set to a value.

like image 30
NZgeek Avatar answered Oct 04 '22 19:10

NZgeek