Why My Applicaion cannot display unicode character correctly?

I decided to turn my win32 c++ application into Unicode version but when I use that i got unreadable letters for Arabic, Chinese and Japanese...


If I don't use Unicode I got Arabic ok in edit boxes Window titles:

HWND hWnd = CreateWindowEx(WS_EX_CLIENTEDGE, "Edit", "ا ب ت ث ج ح خ د ذ", WS_CHILD | WS_VISIBLE | WS_BORDER | ES_MULTILINE, 10, 10, 300, 200, hWnd, (HMENU)100, GetModuleHandle(NULL), NULL);

SetWindowText(hWnd, "صباح الخير");

The output seems ok and works fine! (without unicode).

  • With Unicode:

I added before inclusion headers:

#define UNICODE
#include <windows.h

Now in Window Procedure:

case WM_CREATE:{
    HWND hEdit = CreateWindowExW(WS_EX_CLIENTEDGE, L"Edit", L"ا ب ت ث ج ح خ د ذ", WS_CHILD | WS_VISIBLE | WS_BORDER | ES_MULTILINE, 10, 10, 300, 200, hWnd, (HMENU)100, GetModuleHandle(NULL), NULL);

    // Even I send message to change text but I get unreadable characters!
    SendDlgItemMessageW(hWnd, 100, WM_SETTEXT, 0, (LPARAM)L"السلام عليكم"); // Get unreadable characters also

ِAs you can see with Unicode the controls cannot display Arabic characters correctly.

  • The thing that matters is: After the control is created I delete the content manually with backspace Now If I enter an Arabic text manually It succeeds to display it correctly?!!! But why Wen using Functions? Like SetWindowTextW()??

Please Help. Thank you.

1 Answers

Make sure to save the source file as UTF-16 or UTF-8 with BOM. Many Windows applications assume the ANSI encoding (default localized Windows code page) otherwise. You can also check compiler switches to force using UTF-8 for source files. For example, MS Visual Studio 2015's compiler has a /utf-8 switch so saving with BOM is not required.

Here's a simple example saved in UTF-8, and then UTF-8 w/ BOM and compiled with the Microsoft Visual Studio compiler. Note that there is no need to define UNICODE if you hard-code the W versions of the APIs and use L"" for wide strings:

#include <windows.h>

int main()
    MessageBoxW(NULL,L"ا ب ت ث ج ح خ د ذ",L"中文",MB_OK);

Result (UTF-8). The compiler assumed ANSI encoding (Windows-1252) and decoded the wide string incorrectly.

Corrupted image

Result (UTF-8 w/ BOM). The compiler detects the BOM and uses UTF-8 to decode the source code, resulting in the correct data generated for the wide strings.

Correct image

A little Python code demonstrating the decode error:

>>> s='中文,ا ب ت ث ج ح خ د ذ'
>>> print(s.encode('utf8').decode('Windows-1252'))
中文,ا ب ت ث ج ح خ د ذ
