Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relation between .NET Encoding and Characterset

What's relation between CharacterSet here:
http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx
and ascii encoding here:
http://msdn.microsoft.com/en-us/library/system.text.asciiencoding.getbytes(VS.71).aspx

like image 892
programmernovice Avatar asked Dec 08 '22 05:12

programmernovice


2 Answers

ANSI is the current Windows ANSI code page, equivalent to Encoding.Default.

OEM is the current OEM code page typically used by console applications.

You can get this using:

Encoding.GetEncoding(CultureInfo.CurrentCulture.TextInfo.OEMCodePage)

In a console application, the OEM encoding will also be available using

Console.OutputEncoding
like image 142
Joe Avatar answered Dec 10 '22 02:12

Joe


This is really, really ancient. ODBC dates from the stone age, back when Windows starting taking over from MS-DOS. Back then, lots of text was still encoded in the original IBM-PC character set, named the "OEM Character Set" by Microsoft. The standard IBM-PC set had some accented characters and pseudo graphics glyphs in the upper half, codes 0x80-0xff.

Too limited for text output in non-English languages, Microsoft started using code pages, ranges of character glyphs suitable for a certain language group. The American English set of characters were standardized by ANSI, that label is now attached (incorrectly) to any non-OEM code page.

Nobody encodes text in the OEM character set anymore, it went the way of the dodo at least 10 years ago. The proper setting here is ANSI. And keeping your fingers crossed behind your back that the code page used to encode the text matches your system's default code page. That's dodo too, Unicode solved it.

like image 21
Hans Passant Avatar answered Dec 10 '22 03:12

Hans Passant