Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Force XDocument to write to String with UTF-8 encoding

I want to be able to write XML to a String with the declaration and with UTF-8 encoding. This seems mighty tricky to accomplish.

I have read around a bit and tried some of the popular answers for this but the they all have issues. My current code correctly outputs as UTF-8 but does not maintain the original formatting of the XDocument (i.e. indents / whitespace)!

Can anyone offer some advice please?

XDocument xml = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), xelementXML);  MemoryStream ms = new MemoryStream(); using (XmlWriter xw = new XmlTextWriter(ms, Encoding.UTF8)) {     xml.Save(xw);     xw.Flush();      StreamReader sr = new StreamReader(ms);     ms.Seek(0, SeekOrigin.Begin);      String xmlString = sr.ReadToEnd(); } 

The XML requires the formatting to be identical to the way .ToString() would format it i.e.

<?xml version="1.0" encoding="utf-8" standalone="yes"?> <root>     <node>blah</node> </root> 

What I'm currently seeing is

<?xml version="1.0" encoding="utf-8" standalone="yes"?><root><node>blah</node></root> 

Update I have managed to get this to work by adding XmlTextWriter settings... It seems VERY clunky though!

MemoryStream ms = new MemoryStream(); XmlWriterSettings settings = new XmlWriterSettings(); settings.Encoding = Encoding.UTF8; settings.ConformanceLevel = ConformanceLevel.Document; settings.Indent = true; using (XmlWriter xw = XmlTextWriter.Create(ms, settings)) {     xml.Save(xw);     xw.Flush();      StreamReader sr = new StreamReader(ms);     ms.Seek(0, SeekOrigin.Begin);     String blah = sr.ReadToEnd(); } 
like image 609
Chris Avatar asked Oct 06 '10 10:10

Chris


People also ask

Is UTF8 the same as UTF-8?

There is no difference between "utf8" and "utf-8"; they are simply two names for UTF8, the most common Unicode encoding.

What is UTF8 system text encoding?

UTF-8 is a Unicode encoding that represents each code point as a sequence of one to four bytes. Unlike the UTF-16 and UTF-32 encodings, the UTF-8 encoding does not require "endianness"; the encoding scheme is the same regardless of whether the processor is big-endian or little-endian.


2 Answers

Try this:

using System; using System.IO; using System.Text; using System.Xml.Linq;  class Test {     static void Main()     {         XDocument doc = XDocument.Load("test.xml",                                        LoadOptions.PreserveWhitespace);         doc.Declaration = new XDeclaration("1.0", "utf-8", null);         StringWriter writer = new Utf8StringWriter();         doc.Save(writer, SaveOptions.None);         Console.WriteLine(writer);     }      private class Utf8StringWriter : StringWriter     {         public override Encoding Encoding { get { return Encoding.UTF8; } }     } } 

Of course, you haven't shown us how you're building the document, which makes it hard to test... I've just tried with a hand-constructed XDocument and that contains the relevant whitespace too.

like image 74
Jon Skeet Avatar answered Oct 08 '22 21:10

Jon Skeet


Try XmlWriterSettings:

XmlWriterSettings xws = new XmlWriterSettings(); xws.OmitXmlDeclaration = false; xws.Indent = true; 

And pass it on like

using (XmlWriter xw = XmlWriter.Create(sb, xws)) 
like image 35
KMån Avatar answered Oct 08 '22 21:10

KMån