Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A weird thing in c# Encoding

Tags:

c#

encoding

I convert a byte array to a string , and I convert this string to byte array. these two byte arrays are different.

As below:

byte[] tmp = Encoding.ASCII.GetBytes(Encoding.ASCII.GetString(b));

Suppose b is a byte array.

b[0]=3, b[1]=188, b[2]=2 //decimal system

Result:

tmp[0]=3, tmp[1]=63, tmp[2]=2

So that's my problem, what's wrong with it?

like image 977
roast_soul Avatar asked Dec 16 '22 17:12

roast_soul


2 Answers

188 is out of range for ASCII. Characters that are not in the corresponding character set are transposed to '?' by design (would you prefer transposing to "1/4"?)

like image 196
Rowland Shaw Avatar answered Dec 18 '22 07:12

Rowland Shaw


ASCII is 7-bit only, so others are invalid. By default it uses ? to replace any invalid bytes and that's why you get a ?.

For 8-bit character sets, you should be looking for either the Extended ASCII (which is later defined "ISO 8859-1") or the code page 437 (which is often confused with Extended ASCII, but in fact it's not).

You can use the following code:

Encoding enc = Encoding.GetEncoding("iso-8859-1");
// For CP437, use Encoding.GetEncoding(437)
byte[] tmp = enc.GetBytes(enc.GetString(b));
like image 40
Alvin Wong Avatar answered Dec 18 '22 05:12

Alvin Wong