Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-unicode XML representation

I have xml where some of the element values are unicode characters. Is it possible to represent this in an ANSI encoding?

E.g.

<?xml version="1.0" encoding="utf-8"?>
<xml>
<value>受</value>
</xml>

to

<?xml version="1.0" encoding="Windows-1252"?>
<xml>
<value>&#27544;</value>
</xml>

I deserialize the XML and then attempt to serialize it using XmlTextWriter specifying the Default encoding (Default is Windows-1252). All the unicode characters end up as question marks. I'm using VS 2008, C# 3.5

like image 377
Richard Nienaber Avatar asked Feb 04 '26 21:02

Richard Nienaber


2 Answers

Okay I tested it with the following code:

 string xml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><xml><value>受</value></xml>";

 XmlWriterSettings settings = new XmlWriterSettings { Encoding = Encoding.Default };
 MemoryStream ms = new MemoryStream();
 using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
      XElement.Parse(xml).WriteTo(writer);

 string value = Encoding.Default.GetString(ms.ToArray());

And it correctly escaped the unicode character thus:

<?xml version="1.0" encoding="Windows-1252"?><xml><value>&#x53D7;</value></xml>

I must be doing something wrong somewhere else. Thanks for the help.

like image 69
Richard Nienaber Avatar answered Feb 06 '26 13:02

Richard Nienaber


If I understand the question, then yes. You just need a ; after the 27544:

<?xml version="1.0" encoding="Windows-1252"?>
<xml>
<value>&#27544;</value>
</xml>

Or are you wondering how to generate this XML programmatically? If so, what language/environment are you working in?

like image 31
Blair Conrad Avatar answered Feb 06 '26 14:02

Blair Conrad



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!