Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C/C++ encoding questions

I have a few questions in trying to understand different encodings.

What is the default encoding for strings?

char ascii[]= "Some text"; // This is plain ASCII right?
wchar_t utf[] = L"Some Text"; // Is this UTF-16? Or ASCII stored in wchar_t's?
MessageBoxW(NULL, L"Hello", L"HI", MB_OK); // What encodings are the 2 strings in?

And then, how would I create a UTF-8 string? If I wanted to display UTF-8 characters in a MessageBox?

My questions are mostly directed at Windows by the way, but if it's different on different OSes I'm interested to know.

like image 554
Josh Avatar asked Mar 15 '12 05:03

Josh


1 Answers

The standard doesn't specify the encoding for either narrow or wide strings. The vendor will normally aim for something that's not surprising on the target machine, but it's hard to say more than that. This means, for example, the narrow string would probably use ASCII (or, really, something like ISO-8859) on most personal computers, but EBCDIC on an IBM mainframe.

The wide character strings vary as well -- for example, most compilers on Windows would use UTF-16. On Linux, UTF-32/UCS-4 is probably more common.

The mention of MessageBox suggests Windows, where (as you've surmised) you'll normally have UTF-16 for wide strings. In this case, if you explicitly specify wide strings, you also want to explicitly specify the wide version of the function -- MessageBoxW.

As far as creating a UTF-8 string literal goes, about all I can say is "good luck". It would be up to Visual Studio to do that, but if there's a way to get it to do that, I'm not aware of it.

like image 122
Jerry Coffin Avatar answered Oct 16 '22 21:10

Jerry Coffin