I have a few questions in trying to understand different encodings.
What is the default encoding for strings?
char ascii[]= "Some text"; // This is plain ASCII right?
wchar_t utf[] = L"Some Text"; // Is this UTF-16? Or ASCII stored in wchar_t's?
MessageBoxW(NULL, L"Hello", L"HI", MB_OK); // What encodings are the 2 strings in?
And then, how would I create a UTF-8 string? If I wanted to display UTF-8 characters in a MessageBox?
My questions are mostly directed at Windows by the way, but if it's different on different OSes I'm interested to know.
The standard doesn't specify the encoding for either narrow or wide strings. The vendor will normally aim for something that's not surprising on the target machine, but it's hard to say more than that. This means, for example, the narrow string would probably use ASCII (or, really, something like ISO-8859) on most personal computers, but EBCDIC on an IBM mainframe.
The wide character strings vary as well -- for example, most compilers on Windows would use UTF-16. On Linux, UTF-32/UCS-4 is probably more common.
The mention of MessageBox
suggests Windows, where (as you've surmised) you'll normally have UTF-16 for wide strings. In this case, if you explicitly specify wide strings, you also want to explicitly specify the wide version of the function -- MessageBoxW
.
As far as creating a UTF-8 string literal goes, about all I can say is "good luck". It would be up to Visual Studio to do that, but if there's a way to get it to do that, I'm not aware of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With