Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What encoding Win32 API functions expect?

For example, MessageBox function has LPCTSTR typed argument for text and caption, which is a pointer to char or pointer to wchar when _UNICODE or _MBCS are defined, respectively.

How does the MessageBox function interpret those stings? As which encoding?

Only explanation I managed to find is this:

http://msdn.microsoft.com/en-us/library/cwe8bzh0(VS.90).aspx

But it doesn't say anything about encoding? Just that in case of _MBCS one character takes up one wchar (which is 16-bit on Windows) and that in case of _UNICODE one or two char's (8-bit).

So are those some Microsoft's versions of UTF-8 and UTF-16 that ignore anything that has to be encoded in 3 or four bytes in case of UTF-8 and anything that has to be encoded in 4 bytes in case of UTF-16? And is there a way to show anything outside of basic multilingual plane of Unicode with MessageBox?

like image 911
Bojan Avatar asked Nov 10 '10 09:11

Bojan


People also ask

Is Windows Unicode UTF 16?

Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as one or two 16-bit values. UTF-16 characters are called wide characters, to distinguish them from 8-bit ANSI characters.

What encoding system does Microsoft office and the Windows operating systems use?

The system uses Unicode exclusively for character and string manipulation.

What is Windows API function?

The Windows API (application programming interface) allows user-written programs to interact with Windows, for example to display things on screen and get input from mouse and keyboard. All Windows programs except console programs must interact with the Windows API regardless of the language.

What is Win32 API library?

Despite the file extension of exe , these actually are dynamic-link libraries. Win32 is the 32-bit application programming interface (API) for versions of Windows from 95 onwards. The API consists of functions implemented, as with Win16, in system DLLs. The core DLLs of Win32 are kernel32. dll, user32.


2 Answers

There are normally two different implementations of each function:

  • MessageBoxA, which accepts ANSI strings
  • MessageBoxW, which accepts Unicode strings

Here, 'ANSI' means the multi-byte code page currently assigned to the process. This varies according to the user's preferences and locale setting, although Win32 API functions such as WideCharToMultiByte can be counted on to do the right conversion, and the GetACP function will tell you the code page in use. MSDN explains the ANSI code page and how it interacts with Unicode.

'Unicode' generally means UCS-2; that is, support for characters above 0xFFFF isn't consistent. I haven't tried this, but UI functions such as MessageBox in recent versions (> Windows 2000) should support characters outside the BMP.

like image 139
Tim Robinson Avatar answered Oct 14 '22 03:10

Tim Robinson


The ...A functions are obsolete and only wrap the ...W functions. The former were required for compatibility with Windows 9x, but since that is not used any more, you should avoid them at any costs and use the ...W functions exclusively. They require UTF-16 strings, the only native Windows encoding. All modern Windows versions should support non-BMP characters quite well (if there is a font that has these characters, of course).

like image 39
Philipp Avatar answered Oct 14 '22 05:10

Philipp