i have the following code:
var doc = new XmlDocument();
XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(xmlDeclaration);
XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";
StringWriter sw = new StringWriter();
doc.Save(sw);
Console.WriteLine(sw.ToString());
Console.WriteLine();
MemoryStream ms = new MemoryStream();
doc.Save(ms);
Console.WriteLine(Encoding.ASCII.GetString(ms.ToArray()));
And here is the output:
<?xml version="1.0" encoding="utf-16"?>
<myRoot>myInnerText</myRoot>
???<?xml version="1.0" encoding="UTF-8"?>
<myRoot>myInnerText</myRoot>
Basically what it does is make an xml file, and set the encoding to utf8, but when it saves it to stringwriter it ignores my encoding and uses utf16. However, when using a memory stream, it uses utf8 (with the extra BOM chars)
Why is this? Why isn't it honouring my explicit encoding setting of utf-8?
Thanks a lot
Because all you are doing is setting an XML element that says it's UTF-8, you aren't actually saving it as UTF-8. You need to set the output stream to use UTF-8, like this:
var doc = new XmlDocument();
XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";
using(TextWriter sw = new StreamWriter("C:\\output.txt", false, Encoding.UTF8)) //Set encoding
{
doc.Save(sw);
}
Once you do that, you don't even have to add the XML declaration. It figures it out on its own. If you want to save it to a MemoryStream, use a StreamWriter that wraps the MemoryStream.
I use the following method, it writes it out pretty and as UTF-8
public static string Beautify(XmlDocument doc)
{
string xmlString = null;
using (MemoryStream ms = new MemoryStream()) {
XmlWriterSettings settings = new XmlWriterSettings {
Encoding = new UTF8Encoding(false),
Indent = true,
IndentChars = " ",
NewLineChars = "\r\n",
NewLineHandling = NewLineHandling.Replace
};
using (XmlWriter writer = XmlWriter.Create(ms, settings)) {
doc.Save(writer);
}
xmlString = Encoding.UTF8.GetString(ms.ToArray());
}
return xmlString;
}
Call it like:
File.WriteAllText(fileName, Utilities.Beautify(xmlDocument));
From the MSDN we can see...
The encoding on the TextWriter determines the encoding that is written out (The encoding of the XmlDeclaration node is replaced by the encoding of the TextWriter). If there was no encoding specified on the TextWriter, the XmlDocument is saved without an encoding attribute.
If you want to use the encoding from the XmlDeclaration you'll need to use a stream to save the document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With