Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Converting a string to bytes and then back to a string with default encoder mangles the string

Tags:

c#

encoding

I am troubleshooting a strange issue reported by a client which is caused by the application trying to parse invalid XML. I believe the root cause to be related to how the XML string is encoded and then decoded. I have an internal API that gets the XML string (which I know to be valid to begin with), then converts it to a byte array and wraps it with a readonly MemoryStream. Then on the other side, the stream is converted back to a string and then passed to XDocument.Parse(string). The latter call fails, saying "Data at the root level is invalid. Line 1, position 1." Anyway, I believe the root cause has to do with how I am encoding and then decoding the string. In fact, the following line of debugging code returns a different string than what was passed in.

Encoding.Default.GetString(Encoding.Default.GetBytes(GetMeAnXmlString())));

Using Encoding.Default on the way in and then back out yields a different string than what I started with. That's craaaazy. Any ideas?

Note:

I am using an API which I cannot change which retrieves the stream containing the XML, so I cannot alter the use of Encoding.Default. Doing so will risk production issues (a.k.a showstoppers) for clients where everything is working fine.

like image 371
MrCodeMnky Avatar asked Nov 12 '13 22:11

MrCodeMnky


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C language?

C is an imperative procedural language supporting structured programming, lexical variable scope, and recursion, with a static type system. It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support.

Is C language easy?

Compared to other languages—like Java, PHP, or C#—C is a relatively simple language to learn for anyone just starting to learn computer programming because of its limited number of keywords.

What is C full form?

Full form of C is “COMPILE”. One thing which was missing in C language was further added to C++ that is 'the concept of CLASSES'.


1 Answers

The long and short of it is that Encoding.Default is sketchy because of the code page aspect that Weeble mentioned.

http://msdn.microsoft.com/en-us/library/system.text.encoding.default%28v=vs.110%29.aspx and http://blogs.msdn.com/b/shawnste/archive/2005/03/15/don-t-use-encoding-default.aspx

You'd likely be better off just deciding to use Encoding.Unicode or Encoding.UTF8.

like image 127
trope Avatar answered Oct 19 '22 07:10

trope