According to this sentence in [lex.phases]1.2
Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined.
the snippet below has undefined behavior (live-example):
#include <iostream>
// According to [lex.phases]1.2 this has undefined behavior
const char* p = "\\
u0041";
int main()
{
std::cout << p << '\n';
}
What's the reason for the undefined behavior?
See the discussion in core issue 787:
The undefined behavior referred to above regarding universal-character-names is the result of the considerations described in the C99 Rationale, section 5.2.1, in the part entitled “UCN models.” Three different models for support of UCNs are described, each involving different conversions between UCNs and wide characters and/or at different times during program translation. Implementations, as well as the specification in a language standard, can employ any of the three, but it must be impossible for a well-defined program to determine which model was actually employed by implementation. The implication of this “equivalence principle” is that any construct that would give different results under the different models must be classified as undefined behavior. For example, an apparent UCN resulting from a line-splice would be recognized as a UCN by an implementation in which all wide characters were translated immediately into UCNs, as described in C++ phase 1, but would not be recognized as a UCN by another implementation in which all UCNs were translated immediately into wide characters (a possibility mentioned parenthetically in C++ phase 1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With