Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

This code, why does it have to show undefined behavior?

According to this sentence in [lex.phases]1.2

Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined.

the snippet below has undefined behavior (live-example):

#include <iostream>

// According to [lex.phases]1.2 this has undefined behavior

const char* p = "\\
u0041";

int main()
{
    std::cout << p << '\n';
}

What's the reason for the undefined behavior?

like image 707
Alexander Avatar asked May 06 '17 19:05

Alexander


1 Answers

See the discussion in core issue 787:

The undefined behavior referred to above regarding universal-character-names is the result of the considerations described in the C99 Rationale, section 5.2.1, in the part entitled “UCN models.” Three different models for support of UCNs are described, each involving different conversions between UCNs and wide characters and/or at different times during program translation. Implementations, as well as the specification in a language standard, can employ any of the three, but it must be impossible for a well-defined program to determine which model was actually employed by implementation. The implication of this “equivalence principle” is that any construct that would give different results under the different models must be classified as undefined behavior. For example, an apparent UCN resulting from a line-splice would be recognized as a UCN by an implementation in which all wide characters were translated immediately into UCNs, as described in C++ phase 1, but would not be recognized as a UCN by another implementation in which all UCNs were translated immediately into wide characters (a possibility mentioned parenthetically in C++ phase 1).

like image 56
T.C. Avatar answered Nov 15 '22 12:11

T.C.