Any downsides using '?' instead of L'?' with wchar_t?

Question

Are there any downsides to using '?'-style character literals to compare against, or assign to, values known to be of type wchar_t, instead of using L'?'-style literals?

Ben Voigt · Accepted Answer

They have the wrong datatype and encoding, so that's a bad idea. The compiler will silently widen character literals (for strings you'd get a type mismatch compile error), using the standard integral conversions (such as sign-extension). But the value might not match.

For example, characters 0x80 through 0xff often map to different Unicode codepoints, and the exact mapping varies depending on the compiler's codepage.

Clearly, it's not possible for Unicode to map all the various codepages using an identity conversion. If merely widening were enough, there'd be no need for functions like mbtowcs.

WRT your specific question about '\xAB' vs L'\xAB', they probably are not equal. See http://ideone.com/b1E39

Seth Carnegie · Answer

As I mentioned, the standard says

A char array (whether plain char, signed char, or unsigned char), char16_t array, char32_t array, or wchar_t array can be initialized by a narrow character literal...

However, in the section for the __STDC_MB_MIGHT_NEQ_WC__ preprocessor definition, it says

The integer constant 1, intended to indicate that, in the encoding for wchar_t, a member of the basic character set need not have a code value equal to its value when used as the lone character in an ordinary character literal.

And for __STDC_ISO_10646__:

An integer constant of the form yyyymmL (for example, 199712L). If this symbol is defined, then every character in the Unicode required set, when stored in an object of type wchar_t, has the same value as the short identifier of that character.

I am not exactly a professional at interpreting the standard, but I think that means the answer to your question is that they may have different representations, and you should always use the L.

R.. GitHub STOP HELPING ICE · Answer

The only downside is that your program might fail on stone-age systems using EBCDIC. On any real world system worth consideration, char and wchar_t values for the portable character set are all ASCII, and on increasingly many (but not all), wchar_t is a Unicode codepoint number.

Any downsides using '?' instead of L'?' with wchar_t?

Tags:

c++

c

character

wchar-t

user541686

3 Answers

Ben Voigt

Seth Carnegie

R.. GitHub STOP HELPING ICE

Recent Activity

Donate For Us

Any downsides using '?' instead of L'?' with wchar_t?

Tags:

c++

c

character

wchar-t

user541686

3 Answers

Ben Voigt

Seth Carnegie

R.. GitHub STOP HELPING ICE

Related questions

Recent Activity

Donate For Us