According to ISO/IEC 14882:2011(§2.14.3), character-literal, which is also called constants, is illustrated as below.
character-literal:
’ c-char-sequence ’
u’ c-char-sequence ’
U’ c-char-sequence ’
L’ c-char-sequence ’
...
c-char:
any member of the source character set except
the single-quote ’, backslash \, or new-line character
escape-sequence
universal-character-name
At a glance, it seems directly putting in unicode instead of using universal-character-name in character-literal is illegal. However most compilers, such as g++ and visual studio c++, do not bother at all with it, which is somewhat confusing. Does each implementation automatically convert these unicodes to universal-character-name before compile begins regardless of the standard?
I think the first "phase of translation" handles that (C++11 2.2/1:1.):
Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character.
So your input files are encoded in the source character set, which includes the basic source character set, but in the program text all non-basic characters are replaced by their universal-character-name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With