Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print Latin characters to the C++ console properly on Windows?

I'm having a problem writing French characters to the console in C++. The string is loaded from a file using std::ifstream and std::getline and then printed to the console using std::cout. Here is what the string is in the file:

La chaîne qui correspond au code "TEST_CODE" n'a pas été trouvée à l'aide locale "fr".

And here is how the string is being printed:

La cha¯ne qui correspond au code "TEST_CODE" n'a pas ÚtÚ trouvÚe Ó l'aide locale "fr".

How can I fix this problem?

like image 537
jmegaffin Avatar asked Nov 15 '12 03:11

jmegaffin


2 Answers

The issue is that the console uses different code pages than the rest of the system. For example normally Windows systems set up for the Americas and Western Europe use CP1252, but the console in those regions uses CP437 or CP850.

You can either set the console output code page to match the encoding you're using or you can convert the strings to match the console's output code page.

Set the console output codepage:

SetConsoleOutputCP(GetACP()); // GetACP() returns the system codepage.
std::cout << "La chaîne qui correspond au code \"TEST_CODE\" n'a pas été trouvée à l'aide locale \"fr\".";

Or one of many ways to convert between encodings (this one requires VS2010 or greater):

#include <codecvt> // for wstring_convert
#include <locale>  // for codecvt_byname
#include <iostream>

int main() {
    typedef std::codecvt_byname<wchar_t,char,std::mbstate_t> codecvt;

    // the following relies on non-standard behavior, codecvt destructors are supposed to be protected and unusable here, but VC++ doesn't complain.
    std::wstring_convert<codecvt> cp1252(new codecvt(".1252"));
    std::wstring_convert<codecvt> cp850(new codecvt(".850"));

    std::cout << cp850.to_bytes(cp1252.from_bytes("...été trouvée à...\n")).c_str();
}

The latter example assumes you do in fact need to convert between 1252 and 850. You should probably use the function GetOEMCP() to figure out the actual target code page, and the source codepage actually depends on what you use for the source code rather than on the result of GetACP() on the machine running the program.

Also note that this program relies on something not guaranteed by the standard: that the wchar_t encoding be shared between locales. This is true on most platforms—usually some Unicode encoding is used for wchar_t in all locales—but not all.


Ideally you could just use UTF-8 everywhere and the following would work fine, as it does on other platforms these days:

#include <iostream>

int main() {
    std::cout << "La chaîne qui correspond au code \"TEST_CODE\" n'a pas été trouvée à l'aide locale \"fr\".\n";
}

Unfortunately Windows can't support UTF-8 this way without either abandoning UTF-16 as the wchar_t encoding and adopting a 4 byte wchar_t, or violating requirements of the standard and breaking standard conforming programs.

like image 89
bames53 Avatar answered Sep 18 '22 23:09

bames53


If you want to write Unicode characters in the console, you have to do some initialization:

_setmode(_fileno(stdout), _O_U16TEXT);

Then your French characters are displayed correctly (I've tested it using Consolas as my console font):

#include <fcntl.h>
#include <io.h>

#include <iostream>
#include <ostream>
#include <string>

using namespace std;

int main() 
{
    // Prepare console output in Unicode
    _setmode(_fileno(stdout), _O_U16TEXT);


    //
    // Build Unicode UTF-16 string with French characters
    //

    // 0x00EE - LATIN SMALL LETTER I WITH CIRCUMFLEX
    // 0x00E9 - LATIN SMALL LETTER E WITH ACUTE
    // 0x00E0 - LATIN SMALL LETTER A WITH GRAVE

    wstring str(L"La cha");
    str += L'\x00EE';
    str += L"ne qui correspond au code \"TEST_CODE\" ";
    str += L"n'a pas ";
    str += L'\x00E9';
    str += L't';
    str += L'\x00E9';
    str += L" trouv";
    str += L'\x00E9';
    str += L"e ";
    str += L'\x00E0';
    str += L" l'aide locale \"fr\".";


    // Print the string to the console
    wcout << str << endl;  
}

Consider reading the following blog posts by Michael Kaplan:

  • Myth busting in the console
  • Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?

Moreover, if you are reading some text from a file, you have to know which encoding is used: UTF-8? UTF-16LE? UTF-16BE? Some specific code page? Then you can convert from the specific encoding to Unicode UTF-16 and use UTF-16 inside a Windows application. To convert from some code page (or from UTF-8) to UTF-16 you can use MultiByteToWideChar() API, or ATL conversion helper class CA2W.

like image 27
Mr.C64 Avatar answered Sep 18 '22 23:09

Mr.C64