Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why My Applicaion cannot display unicode character correctly?

I decided to turn my win32 c++ application into Unicode version but when I use that i got unreadable letters for Arabic, Chinese and Japanese...

First:

If I don't use Unicode I got Arabic ok in edit boxes Window titles:

HWND hWnd = CreateWindowEx(WS_EX_CLIENTEDGE, "Edit", "ا ب ت ث ج ح خ د ذ", WS_CHILD | WS_VISIBLE | WS_BORDER | ES_MULTILINE, 10, 10, 300, 200, hWnd, (HMENU)100, GetModuleHandle(NULL), NULL);

SetWindowText(hWnd, "صباح الخير");

The output seems ok and works fine! (without unicode).

  • With Unicode:

I added before inclusion headers:

#define UNICODE
#include <windows.h

Now in Window Procedure:

case WM_CREATE:{
    HWND hEdit = CreateWindowExW(WS_EX_CLIENTEDGE, L"Edit", L"ا ب ت ث ج ح خ د ذ", WS_CHILD | WS_VISIBLE | WS_BORDER | ES_MULTILINE, 10, 10, 300, 200, hWnd, (HMENU)100, GetModuleHandle(NULL), NULL);

    // Even I send message to change text but I get unreadable characters!
}
break;
case WM_LBUTTONDBLCLK:{
    SendDlgItemMessageW(hWnd, 100, WM_SETTEXT, 0, (LPARAM)L"السلام عليكم"); // Get unreadable characters also
}
break;

ِAs you can see with Unicode the controls cannot display Arabic characters correctly.

  • The thing that matters is: After the control is created I delete the content manually with backspace Now If I enter an Arabic text manually It succeeds to display it correctly?!!! But why Wen using Functions? Like SetWindowTextW()??

Please Help. Thank you.

like image 994
WonFeiHong Avatar asked Dec 24 '22 13:12

WonFeiHong


1 Answers

Make sure to save the source file as UTF-16 or UTF-8 with BOM. Many Windows applications assume the ANSI encoding (default localized Windows code page) otherwise. You can also check compiler switches to force using UTF-8 for source files. For example, MS Visual Studio 2015's compiler has a /utf-8 switch so saving with BOM is not required.

Here's a simple example saved in UTF-8, and then UTF-8 w/ BOM and compiled with the Microsoft Visual Studio compiler. Note that there is no need to define UNICODE if you hard-code the W versions of the APIs and use L"" for wide strings:

#include <windows.h>

int main()
{
    MessageBoxW(NULL,L"ا ب ت ث ج ح خ د ذ",L"中文",MB_OK);
}

Result (UTF-8). The compiler assumed ANSI encoding (Windows-1252) and decoded the wide string incorrectly.

Corrupted image

Result (UTF-8 w/ BOM). The compiler detects the BOM and uses UTF-8 to decode the source code, resulting in the correct data generated for the wide strings.

Correct image

A little Python code demonstrating the decode error:

>>> s='中文,ا ب ت ث ج ح خ د ذ'
>>> print(s.encode('utf8').decode('Windows-1252'))
中文,ا ب ت ث ج ح خ د ذ
like image 158
Mark Tolonen Avatar answered Dec 28 '22 10:12

Mark Tolonen