Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When saving an XmlDocument, it ignores the encoding in the XmlDeclaration (UTF8) and uses UTF16

Tags:

c#

xml

i have the following code:

var doc = new XmlDocument();

XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(xmlDeclaration);

XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";

StringWriter sw = new StringWriter();
doc.Save(sw);
Console.WriteLine(sw.ToString());

Console.WriteLine();

MemoryStream ms = new MemoryStream();
doc.Save(ms);
Console.WriteLine(Encoding.ASCII.GetString(ms.ToArray()));

And here is the output:

<?xml version="1.0" encoding="utf-16"?>
<myRoot>myInnerText</myRoot>

???<?xml version="1.0" encoding="UTF-8"?>
<myRoot>myInnerText</myRoot>

Basically what it does is make an xml file, and set the encoding to utf8, but when it saves it to stringwriter it ignores my encoding and uses utf16. However, when using a memory stream, it uses utf8 (with the extra BOM chars)

Why is this? Why isn't it honouring my explicit encoding setting of utf-8?

Thanks a lot

like image 605
Chris Avatar asked Nov 02 '10 03:11

Chris


3 Answers

Because all you are doing is setting an XML element that says it's UTF-8, you aren't actually saving it as UTF-8. You need to set the output stream to use UTF-8, like this:

var doc = new XmlDocument();
XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";
using(TextWriter sw = new StreamWriter("C:\\output.txt", false, Encoding.UTF8)) //Set encoding
{
    doc.Save(sw);
}

Once you do that, you don't even have to add the XML declaration. It figures it out on its own. If you want to save it to a MemoryStream, use a StreamWriter that wraps the MemoryStream.

like image 76
vcsjones Avatar answered Oct 18 '22 13:10

vcsjones


I use the following method, it writes it out pretty and as UTF-8

public static string Beautify(XmlDocument doc)
{
    string xmlString = null;
    using (MemoryStream ms = new MemoryStream()) {
        XmlWriterSettings settings = new XmlWriterSettings {
            Encoding = new UTF8Encoding(false),
            Indent = true,
            IndentChars = "  ",
            NewLineChars = "\r\n",
            NewLineHandling = NewLineHandling.Replace
        };
        using (XmlWriter writer = XmlWriter.Create(ms, settings)) {
            doc.Save(writer);
        }
        xmlString = Encoding.UTF8.GetString(ms.ToArray());
    }
    return xmlString;
}

Call it like:

File.WriteAllText(fileName, Utilities.Beautify(xmlDocument));
like image 22
djunod Avatar answered Oct 18 '22 12:10

djunod


From the MSDN we can see...

The encoding on the TextWriter determines the encoding that is written out (The encoding of the XmlDeclaration node is replaced by the encoding of the TextWriter). If there was no encoding specified on the TextWriter, the XmlDocument is saved without an encoding attribute.

If you want to use the encoding from the XmlDeclaration you'll need to use a stream to save the document.

like image 2
Pace Avatar answered Oct 18 '22 13:10

Pace