Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to initialize or assign 中文 to wstring?

I tried to use L"string", but it doesn't work.

#include <iostream>
using namespace std;

int main(){
    wstring wstr = L"你好";//[Error] converting to execution character set: Illegal byte sequence
    wcout<<wstr<<endl;
}

Use wcin and input 中文 works fine.

#include <iostream>
using namespace std;

int main(){
    wstring wstr;
    wcin>>wstr;//Input Chinese is OK
    wcout<<wstr<<endl;
}

How to initialize or assign 中文 to wstring?

Edit: I tried some online compilers. They all can compile but all output "??".

e.g. cpp.sh jdoodle onlinegdb repl.it

Edit 2: I installed g++ i868 MinGW-W64 8.1.0. Use Visual Studio to save the cpp file as utf8 format. Then use command line to compile it. It still output nothing.

like image 903
fmnijk Avatar asked Jan 25 '23 18:01

fmnijk


2 Answers

Your compiler clearly doesn't like Unicode characters in its source files. Try initializing your string with Unicode escapes, instead:

wstring wstr = L"\u4E2D\u6587"; // These MAY be the correct codes.

Where 4E2D and 6587 are replaced with the actual hexadecimal values for the characters you want. (Sorry, but I don't have access to a full Unicode table for Chinese characters: I tried pasting them into my compiler, and these are the values it gave me on translating.)

The Unicode values given are for the character string in your question (中文); for the (different - 你好) one in your posted code, use L"\u4F60\u597D".

Also see the answer by @MarekR.

like image 110
Adrian Mole Avatar answered Jan 29 '23 22:01

Adrian Mole


This must be configuration issue!

Apparently your compiler uses different encoding then your file is written in! Since you are using Windows most probably encoding of file on your machine is not UTF-8 (end you have copied this file to Linux), but something else. Since gcc is more Linux friendly it may expect UTF-8 and you have an conflict.

This is common problem, since Windows for a long time did maintain some backward compatibility with DOS (where only single byte characters where allowed and system used code pages for respective languages).

As you can see here, most compilers with default settings do not have a problem with code which uses Chinese characters.
I do not see TCM-GCC 4.9.2 compiler on godbolt, but it is not very old gcc after all.

I recommend ensure that code is written in UTF-8 and compiler will treat sources as UTF-8 encoded.

Edit: Adding std::locale::global(std::locale("")); made your code properly displaying this string on godbolt.

like image 45
Marek R Avatar answered Jan 29 '23 22:01

Marek R