Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I'm trying to print a Chinese character using the types wchar_t, char16_t and char32_t, to no avail.

Tags:

c++

cout

c++14

I'm trying to print the Chinese character using the types wchar_t, char16_t and char32_t, without success (live example)

#include <iostream>
int main()
{
    char x[] = "中";            // Chinese character with unicode point U+4E2D
    char y[] = u8"中";
    wchar_t z = L'中';
    char16_t b = u'\u4e2d';
    char32_t a = U'\U00004e2d';

    std::cout << x << '\n';     // Ok
    std::cout << y << '\n';     // Ok
    std::wcout << z << '\n';    // ?? 
    std::cout << a << '\n';     // prints the decimal number (20013) corresponding to the unicode point U+4E2D
    std::cout << b << '\n';     //             "                    "                   "
}
like image 943
François-Marie Arouet Avatar asked Jul 22 '15 18:07

François-Marie Arouet


People also ask

What is wchar_t type in C++?

The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems.

How do I print a wide character in C++?

We can see that to make wide character we have to add 'L' before the character literal. But the character value is not displayed in the output using cout. So to use wide char we have to use wcout, and for taking input we have to use wcin. We can make some wide character array, and print them as string.

Should I use wchar_t?

No, you should not! The Unicode 4.0 standard (ISO 10646:2003) notes that: The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text.

What is the size of wchar_t in C Plus Plus?

Just like the type for character constants is char, the type for wide character is wchar_t. This data type occupies 2 or 4 bytes depending on the compiler being used.


1 Answers

Since you're running your test on a Linux system, source code is UTF-8, which is why x and y are the same thing. Those bytes are shunted, unmodified, into the standard output by std::cout << x and std::cout << y, and when you view the web page (or when you look at the linux terminal), you see the character as you expected.

std::wcout << z will print if you do two things:

std::ios::sync_with_stdio(false);
std::wcout.imbue(std::locale("en_US.utf8"));

without unsynching from C, GNU libstdc++ goes through C IO streams, which can never print a wide char after printing a narrow char on the same stream. LLVM libc++ appears to work even synched, but of course still needs the imbue to tell the stream how to convert the wide chars to the bytes it sends into the standard output.

To print b and a, you will have to convert them to wide or narrow; even with wbuffer_convert setting up a char32_t stream is a lot of work. It would look like this:

std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv32;
std::cout << conv32.to_bytes(a) << '\n';

Putting it all together: http://coliru.stacked-crooked.com/a/a809c38e21cc1743

like image 200
Cubbi Avatar answered Oct 02 '22 21:10

Cubbi