For example, MessageBox function has LPCTSTR typed argument for text and caption, which is a pointer to char or pointer to wchar when _UNICODE or _MBCS are defined, respectively.
How does the MessageBox function interpret those stings? As which encoding?
Only explanation I managed to find is this:
http://msdn.microsoft.com/en-us/library/cwe8bzh0(VS.90).aspx
But it doesn't say anything about encoding? Just that in case of _MBCS one character takes up one wchar (which is 16-bit on Windows) and that in case of _UNICODE one or two char's (8-bit).
So are those some Microsoft's versions of UTF-8 and UTF-16 that ignore anything that has to be encoded in 3 or four bytes in case of UTF-8 and anything that has to be encoded in 4 bytes in case of UTF-16? And is there a way to show anything outside of basic multilingual plane of Unicode with MessageBox?
Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as one or two 16-bit values. UTF-16 characters are called wide characters, to distinguish them from 8-bit ANSI characters.
The system uses Unicode exclusively for character and string manipulation.
The Windows API (application programming interface) allows user-written programs to interact with Windows, for example to display things on screen and get input from mouse and keyboard. All Windows programs except console programs must interact with the Windows API regardless of the language.
Despite the file extension of exe , these actually are dynamic-link libraries. Win32 is the 32-bit application programming interface (API) for versions of Windows from 95 onwards. The API consists of functions implemented, as with Win16, in system DLLs. The core DLLs of Win32 are kernel32. dll, user32.
There are normally two different implementations of each function:
MessageBoxA
, which accepts ANSI stringsMessageBoxW
, which accepts Unicode stringsHere, 'ANSI' means the multi-byte code page currently assigned to the process. This varies according to the user's preferences and locale setting, although Win32 API functions such as WideCharToMultiByte
can be counted on to do the right conversion, and the GetACP
function will tell you the code page in use. MSDN explains the ANSI code page and how it interacts with Unicode.
'Unicode' generally means UCS-2; that is, support for characters above 0xFFFF isn't consistent. I haven't tried this, but UI functions such as MessageBox
in recent versions (> Windows 2000) should support characters outside the BMP.
The ...A
functions are obsolete and only wrap the ...W
functions. The former were required for compatibility with Windows 9x, but since that is not used any more, you should avoid them at any costs and use the ...W
functions exclusively. They require UTF-16 strings, the only native Windows encoding. All modern Windows versions should support non-BMP characters quite well (if there is a font that has these characters, of course).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With