Are there any downsides to using '?'
-style character literals to compare against, or assign to, values known to be of type wchar_t
, instead of using L'?'
-style literals?
They have the wrong datatype and encoding, so that's a bad idea. The compiler will silently widen character literals (for strings you'd get a type mismatch compile error), using the standard integral conversions (such as sign-extension). But the value might not match.
For example, characters 0x80 through 0xff often map to different Unicode codepoints, and the exact mapping varies depending on the compiler's codepage.
Clearly, it's not possible for Unicode to map all the various codepages using an identity conversion. If merely widening were enough, there'd be no need for functions like mbtowcs
.
WRT your specific question about '\xAB'
vs L'\xAB'
, they probably are not equal. See http://ideone.com/b1E39
As I mentioned, the standard says
A char array (whether plain
char
,signed char
, orunsigned char
),char16_t
array,char32_t
array, orwchar_t
array can be initialized by a narrow character literal...
However, in the section for the __STDC_MB_MIGHT_NEQ_WC__
preprocessor definition, it says
The integer constant 1, intended to indicate that, in the encoding for
wchar_t
, a member of the basic character set need not have a code value equal to its value when used as the lone character in an ordinary character literal.
And for __STDC_ISO_10646__
:
An integer constant of the form yyyymmL (for example, 199712L). If this symbol is defined, then every character in the Unicode required set, when stored in an object of type
wchar_t
, has the same value as the short identifier of that character.
I am not exactly a professional at interpreting the standard, but I think that means the answer to your question is that they may have different representations, and you should always use the L
.
The only downside is that your program might fail on stone-age systems using EBCDIC. On any real world system worth consideration, char
and wchar_t
values for the portable character set are all ASCII, and on increasingly many (but not all), wchar_t
is a Unicode codepoint number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With