Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it illegal directly putting in unicode in character-literal instead of using universal-character-name?

Tags:

c++

c++11

unicode

According to ISO/IEC 14882:2011(§2.14.3), character-literal, which is also called constants, is illustrated as below.

character-literal:
    ’ c-char-sequence ’
    u’ c-char-sequence ’
    U’ c-char-sequence ’
    L’ c-char-sequence ’

...

c-char:
    any member of the source character set except
        the single-quote ’, backslash \, or new-line character
    escape-sequence
    universal-character-name

At a glance, it seems directly putting in unicode instead of using universal-character-name in character-literal is illegal. However most compilers, such as g++ and visual studio c++, do not bother at all with it, which is somewhat confusing. Does each implementation automatically convert these unicodes to universal-character-name before compile begins regardless of the standard?

like image 498
user3647351 Avatar asked May 17 '14 14:05

user3647351


1 Answers

I think the first "phase of translation" handles that (C++11 2.2/1:1.):

Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character.

So your input files are encoded in the source character set, which includes the basic source character set, but in the program text all non-basic characters are replaced by their universal-character-name.

like image 118
Kerrek SB Avatar answered Oct 20 '22 08:10

Kerrek SB