Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode vs Multi-byte

I'm really confused by this unicode vs multi-byte thing.

Say I'm compiling my program in Unicode (but ultimately, I want a solution that is independent of the character set used).

1) Will all 'char' be interpreted as wide characters?

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

Thank you.

Regards, Rayne

like image 507
Rayne Avatar asked Feb 09 '10 03:02

Rayne


1 Answers

First, if you're compiling with UNICODE/_UNICODE and don't intend to target other platforms, you can avoid using the TCHAR business and use WCHAR (or wchar_t) and W functions everywhere.

1) Will all 'char' be interpreted as wide characters?

char in C is--by definition--1 byte. (This doesn't technically preclude it from being a "wide character" on platforms where wchar_t is also 1 byte, but given that you're using MSVC and are targeting Windows platforms, that's not going to be the case.)

So for practical purposes, the answer to this is: no.

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

If you're printing ASCII string literals, you can continue using printf.

If you're printing arbitrary strings that could lie outside of the ASCII range, you should use _tprintf (or wprintf).

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

What is "the default format"?

When you're reading in an external file, you should read in the first few bytes first to check for a UTF-16 or UTF-8 BOM, and then base your decisions around that.

like image 79
jamesdlin Avatar answered Sep 22 '22 12:09

jamesdlin