I am retrieving ASCII strings encoded with code page 437 from another system which I need to transform to Unicode so they can be mixed with other Unicode strings.
This is what I am working with:
var asciiString = "\u0094"; // 94 corresponds represents 'ö' in code page 437.
var asciiEncoding = Encoding.GetEncoding(437);
var unicodeEncoding = Encoding.Unicode;
// This is what I attempted to do but it seems not to be able to support the eight bit. Characters using the eight bit are replaced with '?' (0x3F)
var asciiBytes = asciiEncoding.GetBytes(asciiString);
// This work-around does the job, but there must be built in functionality to do this?
//var asciiBytes = asciiString.Select(c => (byte)c).ToArray();
// This piece of code happliy converts the character correctly to unicode { 0x94 } => { 0xF6, 0x0 } .
var unicodeBytes = Encoding.Convert(asciiEncoding, unicodeEncoding, asciiBytes);
var unicodeString = unicodeEncoding.GetString(unicodeBytes); // I want this to be 'ö'.
What I am struggling with is that I cannot find a suitable method in the .NET framework to transform a string with character codes above 127 to a byte array. This seems strange since there are support there to transform a byte array with characters above 127 to Unicode strings.
So my question is, is there any built in method to do this conversion properly or is my work-around the proper way to do it?
var asciiString = "\u0094";
Whatever you name it, this will always be a Unicode string. .NET only has Unicode strings.
I am retrieving ASCII strings encoded with code page 437 from another system
Treat the incoming data as byte[]
, not as string
.
var asciiBytes = new byte[] { 0x94 }; // 94 corresponds represents 'ö' in code page 437.
var asciiEncoding = Encoding.GetEncoding(437);
var unicodeString = asciiEncoding.GetString(asciiBytes);
\u0094
is Unicode code-point 0094, which is a control character; it is not ö
. If you wanted ö
, the correct string is
string s = "ö";
which is LATIN SMALL LETTER O WITH DIAERESIS, aka code-point 00F6.
So:
var s = "\u00F6"; // Identical to "ö"
Now we get our encoding:
var enc = Encoding.GetEncoding(437);
var bytes = enc.GetBytes(s);
And we find that it is a single-byte decimal 148, which is hex 94 - i.e. what you were after.
The significance here is that in C# when you use the "\uXXXX"
syntax, the XXXX is always referring to Unicode code-points, not the encoded value in some particular encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With