Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XmlTextWriter serialization problem

I'm trying to create a piece of xml. I've created the dataclasses with xsd.exe. The root class is MESSAGE.

So after creating a MESSAGE and filling all its properties, I serialize it like this:

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
StringWriter sw = new StringWriter();
serializer.Serialize(sw, response);
string xml = sw.ToString();

Up until now all goes well, the string xml contains valid (UTF-16 encoded) xml. Now I like to create the xml with UTF-8 encoding instead, so I do it like this:

Edit: forgot to include the declaration of the stream

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
using (MemoryStream stream = new MemoryStream())
{
    XmlTextWriter xtw = new XmlTextWriter(stream, Encoding.UTF8);
    serializer.Serialize(xtw, response);
    string xml = Encoding.UTF8.GetString(stream.ToArray());
}

And here comes the problem: Using this approach, the xml string is prepended with an invalid char (the infamous square).
When I inspect the char like this:

char c = xml[0];

I can see that c has a value of 65279.
Anybody has a clue where this is coming from?
I can easily solve this by cutting off the first char:

xml = xml.SubString(1);

But I'd rather know what's going on than blindly cutting of the first char.

Anybody can shed some light on this? Thanks!

like image 469
fretje Avatar asked Jun 09 '09 13:06

fretje


2 Answers

Here's your code modified to not prepend the byte-order-mark (BOM):

var serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
Encoding utf8EncodingWithNoByteOrderMark = new UTF8Encoding(false);
XmlTextWriter xtw = new XmlTextWriter(stream, utf8EncodingWithNoByteOrderMark);
serializer.Serialize(xtw, response);
string xml = Encoding.UTF8.GetString(stream.ToArray());
like image 59
Chris W. Rea Avatar answered Nov 16 '22 20:11

Chris W. Rea


65279 is the Unicode byte order mark - are you sure you're getting 65249? Assuming it really is the BOM, you could get rid of it by creating a UTF8Encoding instance which doesn't use a BOM. (See the constructor overloads for details.)

However, there's an easier way of getting UTF-8 out. You can use StringWriter, but a derived class which overrides the Encoding property. See this answer for an example.

like image 29
Jon Skeet Avatar answered Nov 16 '22 20:11

Jon Skeet