Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++: wide characters outputting incorrectly?

My code is basically this:

wstring japan = L"日本";
wstring message = L"Welcome! Japan is ";

message += japan;

wprintf(message.c_str());

I'm wishing to use wide strings but I do not know how they're outputted, so I used wprintf. When I run something such as:

./widestr | hexdump

The hexidecimal codepoints create this:

65 57 63 6c 6d 6f 21 65 4a 20 70 61 6e 61 69 20 20 73 3f 3f
e  W  c  l  m  o  !  e  J     p  a  n  a  i        s  ?  ?

Why are they all jumped in order? I mean if the wprintf is wrong I still don't get why it'd output in such a specific jumbled order!

edit: endianness or something? they seem to rotate each two characters. huh.

EDIT 2: I tried using wcout, but it outputs the exact same hexidecimal codepoints. Weird!

like image 821
John D. Avatar asked Jun 28 '10 06:06

John D.


People also ask

What is a wide character in C?

A wide character is a computer character datatype that generally has a size greater than the traditional 8-bit character. The increased datatype size allows for the use of larger coded character sets. UTF-16 is one of the most commonly used wide character encodings.

What is Wchar_t in C?

The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems.

What is the wide character range?

Wide characters are similar to character datatype. The main difference is that char takes 1-byte space, but wide character takes 2-bytes (sometimes 4-byte depending on compiler) of space in memory. For 2-byte space wide character can hold 64K (65536) different characters. So the wide char can hold UNICODE characters.

What is wide character type?

A wide character is a 2-byte multilingual character code. Any character in use in modern computing worldwide, including technical symbols and special publishing characters, can be represented according to the Unicode specification as a wide character.


1 Answers

You need to define locale

    #include <stdio.h>
    #include <string>
    #include <locale>
    #include <iostream>

    using namespace std;

    int main()
    {

            std::locale::global(std::locale(""));
            wstring japan = L"日本";
            wstring message = L"Welcome! Japan is ";

            message += japan;

            wprintf(message.c_str());
            wcout << message << endl;
    }

Works as expected (i.e. convert wide string to narrow UTF-8 and print it).

When you define global locale to "" - you set system locale (and if it is UTF-8 it would be printed out as UTF-8 - i.e. wstring will be converted)

Edit: forget what I said about sync_with_stdio -- this is not correct, they are synchronized by default. Not needed.

like image 163
Artyom Avatar answered Sep 18 '22 23:09

Artyom