Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to return xml as UTF-8 instead of UTF-16

I am using a routine that serializes <T>. It works, but when downloaded to the browser I see a blank page. I can view the page source or open the download in a text editor and I see the xml, but it is in UTF-16 which I think is why browser pages show blank?

How do I modify my serializer routine to return UTF-8 instead of UTF-16?

The XML source returned:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <string>January</string>
  <string>February</string>
  <string>March</string>
  <string>April</string>
  <string>May</string>
  <string>June</string>
  <string>July</string>
  <string>August</string>
  <string>September</string>
  <string>October</string>
  <string>November</string>
  <string>December</string>
  <string />
</ArrayOfString>

An example call to the serializer:

DateTimeFormatInfo dateTimeFormatInfo = new DateTimeFormatInfo();
var months = dateTimeFormatInfo.MonthNames.ToList();

string SelectionId = "1234567890";

return new XmlResult<List<string>>(SelectionId)
{
    Data = months
};

The Serializer:

public class XmlResult<T> : ActionResult
{
    private string filename = DateTime.Now.ToString("ddmmyyyyhhss");

    public T Data { private get; set; }

    public XmlResult(string selectionId = "")
    {
        if (selectionId != "")
        {
            filename = selectionId;
        }
    }

    public override void ExecuteResult(ControllerContext context)
    {
        HttpContextBase httpContextBase = context.HttpContext;
        httpContextBase.Response.Buffer = true;
        httpContextBase.Response.Clear();

        httpContextBase.Response.AddHeader("content-disposition", "attachment; filename=" + filename + ".xml");
        httpContextBase.Response.ContentType = "text/xml";

        using (StringWriter writer = new StringWriter())
        {
            XmlSerializer xml = new XmlSerializer(typeof(T));
            xml.Serialize(writer, Data);
            httpContextBase.Response.Write(writer);
        }
    }
}
like image 594
rwkiii Avatar asked Sep 08 '14 18:09

rwkiii


People also ask

What is the default encoding for XML?

UTF-8 is the default character encoding for XML documents. Character encoding can be studied in our Character Set Tutorial. UTF-8 is also the default encoding for HTML5, CSS, JavaScript, PHP, and SQL.

Does XML support UTF-16?

What encodings are supported in XML. According to the specification, all XML parsers must be capable of reading documents in at least two encodings: UTF-8 and UTF-16. Many parsers support more encodings, but these should always work.

Is UTF-8 the default encoding?

Fortunately UTF-8 is the default per sé. When reading an XML document and writing it in another encoding, mostly this attribute will be patched too.

What's the difference between UTF-8 and UTF-16?

The main difference between UTF-8, UTF-16, and UTF-32 character encoding is how many bytes it requires to represent a character in memory. UTF-8 uses a minimum of one byte, while UTF-16 uses a minimum of 2 bytes.


2 Answers

You can use a StringWriter that will force UTF8. Here is one way to do it:

public class Utf8StringWriter : StringWriter
{
    // Use UTF8 encoding but write no BOM to the wire
    public override Encoding Encoding
    {
         get { return new UTF8Encoding(false); } // in real code I'll cache this encoding.
    }
}

and then use the Utf8StringWriter writer in your code.

using (StringWriter writer = new Utf8StringWriter())
{
    XmlSerializer xml = new XmlSerializer(typeof(T));
    xml.Serialize(writer, Data);
    httpContextBase.Response.Write(writer);
}

answer is inspired by Serializing an object as UTF-8 XML in .NET

like image 139
Yishai Galatzer Avatar answered Oct 23 '22 11:10

Yishai Galatzer


Encoding of the Response

I am not quite familiar with this part of the framework. But according to the MSDN you can set the content encoding of an HttpResponse like this:

httpContextBase.Response.ContentEncoding = Encoding.UTF8;

Encoding as seen by the XmlSerializer

After reading your question again I see that this is the tough part. The problem lies within the use of the StringWriter. Because .NET Strings are always stored as UTF-16 (citation needed ^^) the StringWriter returns this as its encoding. Thus the XmlSerializer writes the XML-Declaration as

<?xml version="1.0" encoding="utf-16"?>

To work around that you can write into an MemoryStream like this:

using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
    XmlSerializer xml = new XmlSerializer(typeof(T));
    xml.Serialize(writer, Data);

    // I am not 100% sure if this can be optimized
    httpContextBase.Response.BinaryWrite(stream.ToArray());
}

Other approaches

Another edit: I just noticed this SO answer linked by jtm001. Condensed the solution there is to provide the XmlSerializer with a custom XmlWriter that is configured to use UTF8 as encoding.

Athari proposes to derive from the StringWriter and advertise the encoding as UTF8.

To my understanding both solutions should work as well. I think the take-away here is that you will need one kind of boilerplate code or another...

like image 45
NobodysNightmare Avatar answered Oct 23 '22 11:10

NobodysNightmare