Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to put an encoding attribute to xml other that utf-16 with XmlWriter?

I've got a function creating some XmlDocument:

public string CreateOutputXmlString(ICollection<Field> fields) {     XmlWriterSettings settings = new XmlWriterSettings();     settings.Indent = true;     settings.Encoding = Encoding.GetEncoding("windows-1250");      StringBuilder builder = new StringBuilder();     XmlWriter writer = XmlWriter.Create(builder, settings);      writer.WriteStartDocument();     writer.WriteStartElement("data");     foreach (Field field in fields)     {         writer.WriteStartElement("item");         writer.WriteAttributeString("name", field.Id);         writer.WriteAttributeString("value", field.Value);         writer.WriteEndElement();     }     writer.WriteEndElement();     writer.Flush();     writer.Close();      return builder.ToString(); } 

I set an encoding but after i create XmlWriter it does have utf-16 encoding. I know it's because strings (and StringBuilder i suppose) are encoded in utf-16 and you can't change it.
So how can I easily create this xml with the encoding attribute set to "windows-1250"? it doesn't even have to be encoded in this encoding, it just has to have the specified attribute.

edit: it has to be in .Net 2.0 so any new framework elements cannot be used.

like image 550
agnieszka Avatar asked Jan 09 '09 11:01

agnieszka


People also ask

Does XML support UTF 16?

If you type an XML document into Notepad, you can choose from one of several supported character encodings including ANSI, UTF-8, or UTF-16.

What is UTF 16 in XML?

UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding information, UTF-8 is set by default.


2 Answers

You need to use a StringWriter with the appropriate encoding. Unfortunately StringWriter doesn't let you specify the encoding directly, so you need a class like this:

public sealed class StringWriterWithEncoding : StringWriter {     private readonly Encoding encoding;      public StringWriterWithEncoding (Encoding encoding)     {         this.encoding = encoding;     }      public override Encoding Encoding     {         get { return encoding; }     } } 

(This question is similar but not quite a duplicate.)

EDIT: To answer the comment: pass the StringWriterWithEncoding to XmlWriter.Create instead of the StringBuilder, then call ToString() on it at the end.

like image 86
Jon Skeet Avatar answered Sep 21 '22 17:09

Jon Skeet


Just some extra explanations to why this is so.

Strings are sequences of characters, not bytes. Strings, per se, are not "encoded", because they are using characters, which are stored as Unicode codepoints. Encoding DOES NOT MAKE SENSE at String level.

An encoding is a mapping from a sequence of codepoints (characters) to a sequence of bytes (for storage on byte-based systems like filesystems or memory). The framework does not let you specify encodings, unless there is a compelling reason to, like to make 16-bit codepoints fit on byte-based storage.

So when you're trying to write your XML into a StringBuilder, you're actually building an XML sequence of characters and writing them as a sequence of characters, so no encoding is performed. Therefore, no Encoding field.

If you want to use an encoding, the XmlWriter has to write to a Stream.

About the solution that you found with the MemoryStream, no offense intended, but it's just flapping around arms and moving hot air. You're encoding your codepoints with 'windows-1252', and then parsing it back to codepoints. The only change that may occur is that characters not defined in windows-1252 get converted to a '?' character in the process.

To me, the right solution might be the following one. Depending on what your function is used for, you could pass a Stream as a parameter to your function, so that the caller decides whether it should be written to memory or to a file. So it would be written like this:

         public static void WriteFieldsAsXmlDocument(ICollection fields, Stream outStream)         {             XmlWriterSettings settings = new XmlWriterSettings();             settings.Indent = true;             settings.Encoding = Encoding.GetEncoding("windows-1250");              using(XmlWriter writer = XmlWriter.Create(outStream, settings)) {                 writer.WriteStartDocument();                 writer.WriteStartElement("data");                 foreach (Field field in fields)                 {                     writer.WriteStartElement("item");                     writer.WriteAttributeString("name", field.Id);                     writer.WriteAttributeString("value", field.Value);                     writer.WriteEndElement();                 }                 writer.WriteEndElement();             }         } 
like image 20
Laurent LA RIZZA Avatar answered Sep 21 '22 17:09

Laurent LA RIZZA