Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ANSI Code-Page and Console Code-Page are different?

Microsoft Windows provides several functions to query the current code-page: GetACP, GetConsoleOutputCP, GetConsoleCP.

They return different values. For example, on my machine, GetACP returns 1252 while GetConsoleOutputCP and GetConsoleCP return 437.

(We can also run chcp on the command line and get 437)

  • Why does Windows provide different code pages for console and non-console?
  • How are these code pages determined per machine?
  • What is the relation between code pages on the same machine? Is there a correlation between the console and non console code pages? Will machines with codepage 1252 always have console codepage of 437?

The background for this question is an error message from Visual Studio C++:

error C2855: command-line option '/source-charset' inconsistent with precompiled header
error C2855: command-line option '/execution-charset' inconsistent with precompiled header

These errors occurred when the precompiled headers file was built with a different default code-page than the CPP file that was using them (for whatever reason).
From the MSDN docs:

If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you specify a character set name or code page by using the /source-charset option.

So I'm trying to figure out which code page they refer to, the one that is returned by GetACP or the others...

like image 997
Amir Gonnen Avatar asked Dec 07 '22 18:12

Amir Gonnen


2 Answers

The ANSI and OEM codepages are determined by the system locale that's loaded when the system boots. They get mapped into every process as the PEB fields AnsiCodePageData and OemCodePageData. The runtime library in ntdll.dll has many functions that work with these string types, e.g.RtlAnsiStringToUnicodeString and RtlOemStringToUnicodeString.

Functions ending with A in the Windows API are ANSI, except file system functions can be switched to OEM via SetFileApisToOEM. The console API defaults to OEM for compatibility with legacy applications, and can be changed to another codepage via SetConsoleCP and SetConsoleOutputCP. chcp.com (or mode.com) calls these functions, but it doesn't allow setting the input buffer and screen buffer to different codepages.

If the ANSI codepage is 1252, the OEM codepage isn't necessarily 437. That's only in the U.S. locale. Most Western locales that use 1252 as the ANSI codepage will use 850 as the OEM codepage.

An application that says it's using the user code page may not be referring to the system ANSI or OEM codepage. Instead it could be calling, e.g., GetLocaleInfoEx to query the LOCALE_NAME_USER_DEFAULT locale for the LOCALE_IDEFAULTANSICODEPAGE or LOCALE_IDEFAULTCODEPAGE.

like image 73
Eryk Sun Avatar answered Dec 11 '22 09:12

Eryk Sun


The command console uses a different codepage for legacy reasons. The programs running on the console were often written for DOS, and the character set included things like line drawing characters that would be useful in this context. In a graphical environment with native Windows apps it was more important to expand the available characters since the lines would be drawn directly instead of being simulated in fonts.

The default code pages are determined by the language Windows will be using. Different languages require different characters, and a single code page wasn't enough to fit all of the characters used by European languages. You will find code page 1250 used in some Central and Eastern European locations for example.

like image 23
Mark Ransom Avatar answered Dec 11 '22 10:12

Mark Ransom