In Visual Studio 2005 on 32-bit Windows, why doesn't my console display characters from 128 to 255? for example: <pre class="prettyprint"><code>cout << "¿" << endl; //inverted question mark </code></pre> Output: <pre class="prettyprint"><code>┐ Press any key to continue . . . </code></pre>

A Windows console window is pure Unicode. Its buffer stores text as UCS-2 Unicode (16 bits per character, essentially like original Unicode, a restriction to the Basic Multilingual Plane of modern 21-bit Unicode). So a console window can present almost all kinds of text. However, for single byte per character (and possibly also for some variable length encodings) i/o Windows automatically translates to/from the console window's active codepage. If the console window is a [cmd.exe] instance then you can inspect that via command <code>chcp</code>, short for change codepage. Like this: <pre class="prettyprint"> C:\test> chcp Active code page: 850 C:\test> _ </pre> Codepage 850 is an encoding based on the original IBM PC English codepage 437. 850 is default for console windows on at least Norwegian PC's (although savvy Norwegians may change that to 865). None of those are codepages that you should use, however. The original IBM PC codepage (character encoding) is known as OEM, which is a meaningless acronym, Original Equipment Manufacturer. It had nice line drawing characters suitable for the original PC's text mode screen. More generally OEM means the default code page for console windows, where codepage 437 is just the original one: it can be configured, e.g. per window via <code>chcp</code>. When Microsoft created 16-bit Windows they chose another encoding known in Windows as ANSI. The original one was an extension of ISO Latin-1 which for a long while was the default on the Internet (however, it's unclear which came first: Microsoft participated in the standardization). This original ANSI is now known as Windows ANSI Western. ANSI is the code page used for non-Unicode by almost all the rest of Windows. Console windows use OEM. Notepad, other editors, and so on, use ANSI. Then, when Microsoft made Windows 32-bit, they adopted a 16-bit extension of Latin-1 known as Unicode. Microsoft was an original founding member of the Unicode Consortium. And the basic API, including console windows, the file system, etc., was rewritten to use Unicode. For backward compatibility there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functionality. For example, <code>MessageBoxA</code> is an ANSI wrapper for Unicode-based <code>MessageBoxW</code>. The practical upshot of that is that in Windows your C++ source code is typically encoded with ANSI, while console windows assume OEM. Which e.g. makes <pre class="prettyprint"><code>cout << "I like Norwegian blåbærsyltetøy!" << endl; </code></pre> produce pure gobbledegook… You can use the Unicode-based console window APIs to output Unicode directly to a console window, avoiding the translation, but that's awkward. Note that using <code>wcout</code> instead of <code>cout</code> doesn't help: by design <code>wcout</code> just translates down from wide character strings to the program's narrow character set, discarding information on the way. It can be hard to believe, that the C++ standard library offers a rather big chunk of very very complex functionality that is meaningless (since instead those conversions could just have been supported by <code>cout</code>). But so it is, just meaningless. Possibly it was some political-like compromise, but anyway, <code>wcout</code> does not help, even though if it were meaningful in some way then it "should" logically help with this. So how does a Norwegian novice programmer get e.g. "blåbærsyltetøy" presented? Well, simply by changing the active code page to ANSI. Since on most Western country PCs ANSI is codepage 1252, you can do that for a given command interpreter instance by <pre class="prettyprint"> C:\test> chcp 1252 Active code page: 1252 C:\test> _ </pre> Now old DOS programs like e.g. [edit.com] (still present in Windows XP!) will produce some gobbledegook, because the original PC character set line drawing characters are not there in ANSI, and because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me! If you want this as a more permanent code page, you'll have to change the configuration of console windows via an undocumented registry key: <blockquote> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage </blockquote> In this key, change value of <code>OEMCP</code> to 1252, and reboot. As with <code>chcp</code>, or other change of codepage to 1252, makes old DOS programs present gobbledegook, but makes C++ programs or other modern console programs work OK. Since you then have same character encoding in console windows as in the rest of Windows.

When you print an ASCII string, Windows internally converts it to UNICODE based on the current code page. There is also a translation from UNICODE to "ASCII" done by the CRT. The following would work. <pre class="prettyprint"><code>#include <fcntl.h> #include <io.h> #include <stdio.h> #include <iostream> void __cdecl main(int ac, char **av) { _setmode(_fileno(stdout), _O_U16TEXT); std::wcout << L"\u00BF"; } </code></pre>

Displaying extended ASCII characters

Tags:

c++

windows

x86

intel

visual-studio-2005

In Visual Studio 2005 on 32-bit Windows, why doesn't my console display characters from 128 to 255?

for example:

cout << "¿" << endl;  //inverted question mark

Output:

┐
Press any key to continue . . .

881

asked Feb 03 '11 02:02

user3234

4 Answers

A Windows console window is pure Unicode. Its buffer stores text as UCS-2 Unicode (16 bits per character, essentially like original Unicode, a restriction to the Basic Multilingual Plane of modern 21-bit Unicode). So a console window can present almost all kinds of text.

However, for single byte per character (and possibly also for some variable length encodings) i/o Windows automatically translates to/from the console window's active codepage. If the console window is a [cmd.exe] instance then you can inspect that via command chcp, short for change codepage. Like this:

C:\test> chcp
Active code page: 850

C:\test> _

Codepage 850 is an encoding based on the original IBM PC English codepage 437. 850 is default for console windows on at least Norwegian PC's (although savvy Norwegians may change that to 865). None of those are codepages that you should use, however.

The original IBM PC codepage (character encoding) is known as OEM, which is a meaningless acronym, Original Equipment Manufacturer. It had nice line drawing characters suitable for the original PC's text mode screen. More generally OEM means the default code page for console windows, where codepage 437 is just the original one: it can be configured, e.g. per window via chcp.

When Microsoft created 16-bit Windows they chose another encoding known in Windows as ANSI. The original one was an extension of ISO Latin-1 which for a long while was the default on the Internet (however, it's unclear which came first: Microsoft participated in the standardization). This original ANSI is now known as Windows ANSI Western.

ANSI is the code page used for non-Unicode by almost all the rest of Windows. Console windows use OEM. Notepad, other editors, and so on, use ANSI.

Then, when Microsoft made Windows 32-bit, they adopted a 16-bit extension of Latin-1 known as Unicode. Microsoft was an original founding member of the Unicode Consortium. And the basic API, including console windows, the file system, etc., was rewritten to use Unicode. For backward compatibility there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functionality. For example, MessageBoxA is an ANSI wrapper for Unicode-based MessageBoxW.

The practical upshot of that is that in Windows your C++ source code is typically encoded with ANSI, while console windows assume OEM. Which e.g. makes

cout << "I like Norwegian blåbærsyltetøy!" << endl;

produce pure gobbledegook… You can use the Unicode-based console window APIs to output Unicode directly to a console window, avoiding the translation, but that's awkward.

Note that using wcout instead of cout doesn't help: by design wcout just translates down from wide character strings to the program's narrow character set, discarding information on the way. It can be hard to believe, that the C++ standard library offers a rather big chunk of very very complex functionality that is meaningless (since instead those conversions could just have been supported by cout). But so it is, just meaningless. Possibly it was some political-like compromise, but anyway, wcout does not help, even though if it were meaningful in some way then it "should" logically help with this.

So how does a Norwegian novice programmer get e.g. "blåbærsyltetøy" presented?

Well, simply by changing the active code page to ANSI. Since on most Western country PCs ANSI is codepage 1252, you can do that for a given command interpreter instance by

C:\test> chcp 1252
Active code page: 1252

C:\test> _

Now old DOS programs like e.g. [edit.com] (still present in Windows XP!) will produce some gobbledegook, because the original PC character set line drawing characters are not there in ANSI, and because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me!

If you want this as a more permanent code page, you'll have to change the configuration of console windows via an undocumented registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage

In this key, change value of OEMCP to 1252, and reboot.

As with chcp, or other change of codepage to 1252, makes old DOS programs present gobbledegook, but makes C++ programs or other modern console programs work OK.

Since you then have same character encoding in console windows as in the rest of Windows.

112

answered Oct 02 '22 00:10

Cheers and hth. - Alf

I'm running on Win10 b19043. Changing to the Unicode codepage (65001) allows printing/displaying Extended ASCII characters in the CMD window. Just type this line in your console or batch file and all should be good:

chcp 65001 1>nul

CMD with Unicode codepage

answered Oct 01 '22 23:10

Eric

When you print an ASCII string, Windows internally converts it to UNICODE based on the current code page. There is also a translation from UNICODE to "ASCII" done by the CRT. The following would work.

#include <fcntl.h>
#include <io.h>
#include <stdio.h>
#include <iostream>

void
__cdecl
main(int ac, char **av)
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout  << L"\u00BF";
}

answered Oct 02 '22 00:10

John

Because the Win32 console uses code page 437 (aka the OEM font) to render characters, whereas most of the rest of Windows uses Windows-1252 for single-byte character codes.

The character "¿" is the Unicode character INVERTED QUESTION MARK, which has code point 0xBF (191 decimal) in Unicode, ISO 8859-1, and Windows-1252. The code point 0xBF in CP437 corresponds to the character "┐", which is BOX DRAWINGS LIGHT DOWN AND LEFT (code point U+2510).

As long as you're using the Windows console, you can display only the characters in CP437 and no others. If you want to display other Unicode characters, you'll need to use a different environment.

answered Oct 01 '22 22:10

Adam Rosenfield

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Displaying extended ASCII characters

Tags:

c++

windows

x86

intel

visual-studio-2005

user3234

People also ask

4 Answers

Cheers and hth. - Alf

Eric

John

Adam Rosenfield

Recent Activity

Donate For Us

Displaying extended ASCII characters

Tags:

c++

windows

x86

intel

visual-studio-2005

user3234

People also ask

4 Answers

Cheers and hth. - Alf

Eric

John

Adam Rosenfield

Related questions

Recent Activity

Donate For Us