Microsoft Windows provides several functions to query the current code-page: GetACP
, GetConsoleOutputCP
, GetConsoleCP
.
They return different values. For example, on my machine, GetACP
returns 1252 while GetConsoleOutputCP
and GetConsoleCP
return 437.
(We can also run chcp
on the command line and get 437)
The background for this question is an error message from Visual Studio C++:
error C2855: command-line option '/source-charset' inconsistent with precompiled header
error C2855: command-line option '/execution-charset' inconsistent with precompiled header
These errors occurred when the precompiled headers file was built with a different default code-page than the CPP file that was using them (for whatever reason).
From the MSDN docs:
If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you specify a character set name or code page by using the
/source-charset
option.
So I'm trying to figure out which code page they refer to, the one that is returned by GetACP
or the others...
The ANSI and OEM codepages are determined by the system locale that's loaded when the system boots. They get mapped into every process as the PEB fields AnsiCodePageData
and OemCodePageData
. The runtime library in ntdll.dll has many functions that work with these string types, e.g.RtlAnsiStringToUnicodeString
and RtlOemStringToUnicodeString
.
Functions ending with A in the Windows API are ANSI, except file system functions can be switched to OEM via SetFileApisToOEM
. The console API defaults to OEM for compatibility with legacy applications, and can be changed to another codepage via SetConsoleCP
and SetConsoleOutputCP
. chcp.com (or mode.com) calls these functions, but it doesn't allow setting the input buffer and screen buffer to different codepages.
If the ANSI codepage is 1252, the OEM codepage isn't necessarily 437. That's only in the U.S. locale. Most Western locales that use 1252 as the ANSI codepage will use 850 as the OEM codepage.
An application that says it's using the user code page may not be referring to the system ANSI or OEM codepage. Instead it could be calling, e.g., GetLocaleInfoEx
to query the LOCALE_NAME_USER_DEFAULT
locale for the LOCALE_IDEFAULTANSICODEPAGE
or LOCALE_IDEFAULTCODEPAGE
.
The command console uses a different codepage for legacy reasons. The programs running on the console were often written for DOS, and the character set included things like line drawing characters that would be useful in this context. In a graphical environment with native Windows apps it was more important to expand the available characters since the lines would be drawn directly instead of being simulated in fonts.
The default code pages are determined by the language Windows will be using. Different languages require different characters, and a single code page wasn't enough to fit all of the characters used by European languages. You will find code page 1250 used in some Central and Eastern European locations for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With