I am in the process of learning C++ and came across an article on the MSDN here:
http://msdn.microsoft.com/en-us/magazine/dd861344.aspx
In the first code example the one line of code which my question relates to is the following:
VERIFY(SetWindowText(L"Direct2D Sample"));
More specifically that L prefix. I had a little read up, and correct me if I am wrong :-), but this is to allow for unicode strings, i.e. to prep for a long character set. Now in during my read up on this I came across another article on Adavnced String Techniques in C here http://www.flipcode.com/archives/Advanced_String_Techniques_in_C-Part_I_Unicode.shtml
It says there are a few options including the inclusion of the header:
#define UNICODE
OR
#define _UNICODE
in C , again point out if I am wrong, appreciate your feedback. Further it shows the datatype suitable for these unicode strings being:
wchar_t
It throws into the mix a macro and a kind of hybrid datatype, the macro being:
_TEXT(t)
which simply prefixes the string with the L and the hybrid data type as
TCHAR
Which it points out will allow for unicode if the header is there and ASCII if not. Now my question is, or more of an asumption which I would like to confirm, would Microsoft use this TCHAR data type which is more flexible or is there any benefit to committing to using the wchar_t.
Also when I say does Microsoft use this, more specifically for exmaple in the ATL and WTL libraries, do anyone of yourselves have preference or have some advice regarding this?
Cheers,
Andrew
Microsoft was one of the first companies to implement Unicode in their products.
Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.
The string data types are CHAR , VARCHAR , BINARY , VARBINARY , BLOB , TEXT , ENUM , and SET .
The character set most commonly used in computers today is Unicode, a global standard for character encoding. Internally, Windows applications use the UTF-16 implementation of Unicode. In UTF-16, most characters are identified by two-byte codes.
For all new software you should define UNICODE and use wchar_t directly. Using ANSI stirngs will come back to haunt you.
You should just use wchar_t and the wide versions of all the CRT functions (ex: wcscmp instead of strcmp). The TEXT macros and TCHAR etc just exist if your code needs to work in both ANSI and UNICODE environments which I feel code rarely needs to do.
When you create a new windows application using Visual Studio UNICODE is automatically defined and wchar_t will work like a built-in.
Short answer: the hybrid infrastructure with the TCHAR
type, the _TEXT()
macro and the various _t*
functions (_tcscpy
comes to mind) are a throwback to the times when Microsoft had two platforms coexisting:
String representation here means that all the Windows APIs that expected or returned string to your app used one or the other representation for these strings. COM added even more confusion as it was available on both platforms -- and expected Unicode strings on both!
In those old times it was encouraged that you write "portable" code: you were instructed to use the hybrid infrastructure for your strings so that you can compile for both models just by defining/undefining UNICODE and/or _UNICODE for your app.
As the Windows9x line is no more relevant (for the vast majority of the apps anyway) you can safely ignore the ANSI world and use the Unicode strings directly.
Beware though that Unicode has multiple representations today: as it is pointed out above the Unicode convention implied by wchar_t is the UCS-2 representation (all characters encoded in 16-bit words). There are other, widely used representations where this is not necessarily true.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With