Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert extended ascii to a System.String?

For example: "½" or ASCII DEC 189. When I read the bytes from a text file the byte[] contains the valid value, in this case 189.

Converting to Unicode results in the Unicode replacement character 65533.

UnicodeEncoding.Unicode.GetString(b);

Converting to ASCII results in 63 or "?"

ASCIIEncoding.ASCII.GetString(b);

If this isn't possible what is the best way to handle this data? I'd like to be able to perform string functions like Replace().

like image 752
rtremaine Avatar asked Mar 20 '09 14:03

rtremaine


People also ask

How do I convert ASCII to string?

To convert ASCII to string, use the toString() method. Using this method will return the associated character.

How do I add an extended ASCII character to A string?

On a standard 101 keyboard, special extended ASCII characters such as é or ß can be typed by holding the ALT key and typing the corresponding 4 digit ASCII code. For example é is typed by holding the ALT key and typing 0233 on the keypad.

How do I convert ASCII characters?

Here are few methods in different programming languages to print ASCII value of a given character : Python code using ord function : ord() : It converts the given string of length one, returns an integer representing the Unicode code point of the character. For example, ord('a') returns the integer 97.

Does UTF-8 support extended ASCII?

UTF-8 extends the ASCII character set to use 8-bit code points, which allows for up to 256 different characters. This means that UTF-8 can represent all of the printable ASCII characters, as well as the non-printable characters.


2 Answers

Byte 189 represents a "½" in iso-8859-1 (aka "Latin-1"), so the following is maybe what you want:

var e = Encoding.GetEncoding("iso-8859-1");
var s = e.GetString(new byte[] { 189 });

All strings and chars in .NET are UTF-16 encoded, so you need to use an encoder/decoder to convert anything else, sometimes this is defaulted (e.g. UTF-8 for FileStream instances) but good practice is to always specify.

You will need some form of implicit or (better) explicit metadata to supply you with the information about which encoding.

like image 148
Richard Avatar answered Nov 06 '22 14:11

Richard


The old PC-8 or Extended ASCII character set was around before IBM and Microsoft introduced the idea of Code Pages to the PC world. This WAS Extended ASCII - in 1982. In fact, it was the ONLY character set available on PC's at the time, up until the EGA card allowed you to load other fonts in to VRAM.

This was also the default standard for ANSI terminals, and nearly every BBS I dialed up to in the 80's and early 90's used this character set for displaying menus and boxes.

Here's the code to turn 8-bit Extended ASCII in to Unicode text. Note the key bit of code: the GetEncoding("437"). That used Code Page 437 to translate the 8-bit ASCII text to the Unicode equivalent.

    string ASCII8ToString(byte[] ASCIIData)
    {
        var e = Encoding.GetEncoding("437");
        return e.GetString(ASCIIData);
    }
like image 14
Tom Wilson Avatar answered Nov 06 '22 15:11

Tom Wilson