Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are certain Unicode characters causing std::wcout to fail in a console app?

Consider the following code snippet, compiled as a Console Application on MS Visual Studio 2010/2012 and executed on Win7:

#include "stdafx.h"
#include <iostream>
#include <string>


const std::wstring test = L"hello\xf021test!";

int _tmain(int argc, _TCHAR* argv[])
{
    std::wcout << test << std::endl;
    std::wcout << L"This doesn't print either" << std::endl;

    return 0;
}

The first wcout statement outputs "hello" (instead of something like "hello?test!") The second wcout statement outputs nothing.

It's as if 0xf021 (and other?) Unicode characters cause wcout to fail.

This particular Unicode character, 0xf021 (encoded as UTF-16), is part of the "Private Use Area" in the Basic Multilingual Plane. I've noticed that Windows Console applications do not have extensive support for Unicode characters, but typically each character is at least represented by a default character (e.g. "?"), even if there is no support for rendering a particular glyph.

What is causing the wcout stream to choke? Is there a way to reset it after it enters this state?

like image 353
charunnera Avatar asked Oct 05 '13 02:10

charunnera


2 Answers

wcout, or to be precise, a wfilebuf instance it uses internally, converts wide characters to narrow characters, then writes those to the file (in your case, to stdout). The conversion is performed by the codecvt facet in the stream's locale; by default, that just does wctomb_s, converting to the system default ANSI codepage, aka CP_ACP.

Apparently, character '\xf021' is not representable in the default codepage configured on your system. So the conversion fails, and failbit is set in the stream. Once failbit is set, all subsequent calls fail immediately.

I do not know of any way to get wcout to successfully print arbitrary Unicode characters to console. wprintf works though, with a little tweak:

#include <fcntl.h>
#include <io.h>
#include <string>

const std::wstring test = L"hello\xf021test!";

int _tmain(int argc, _TCHAR* argv[])
{
  _setmode(_fileno(stdout), _O_U16TEXT);
  wprintf(test.c_str());

  return 0;
}
like image 144
Igor Tandetnik Avatar answered Nov 10 '22 20:11

Igor Tandetnik


Setting the mode for stdout to _O_U16TEXT will allow you to write Unicode characters to the wcout stream as well as wprintf. (See Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?) This is the right way to make this work.

_setmode(_fileno(stdout), _O_U16TEXT);

std::wcout << L"hello\xf021test!" << std::endl;
std::wcout << L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd" << std::endl;
std::wcout << L"Now this prints!" << std::endl;

It shouldn't be necessary anymore but you can reset a stream that has entered an error state by calling clear:

if (std::wcout.fail())
{
    std::wcout.clear();
}
like image 14
Eric MSFT Avatar answered Nov 10 '22 20:11

Eric MSFT