Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are trigraph substitutions reverted when a raw string is created through concatenation?

Tags:

c++

c++11

It's pretty common to use macros and token concatenation to switch between wide and narrow strings at compile time.

#define _T(x) L##x
const wchar_t *wide1 = _T("hello");
const wchar_t *wide2 = L"hello";

And in C++11 it should be valid to concoct a similar thing with raw strings:

#define RAW(x) R##x
const char *raw1 = RAW("(Hello)");
const char *raw2 = R"(Hello)";

Since macro expansion and token concatenation happens before escape sequence substitution, this should prevent escape sequences being replaced in the quoted string.

But how does this apply to trigraphs? Are raw strings formed through concatenation with normal strings still subject to having their trigraph substitutions reverted?

const char *trigraph = RAW("(??=)");      // Is this "#" or "??="?
like image 784
Ben Marsh Avatar asked Jul 28 '11 06:07

Ben Marsh


1 Answers

No, the trigraph is not reverted in your example.

[lex.phases]p1 identifies three phases of translation relevant to your question:

1. Trigraph sequences are replaced by corresponding single-character internal representations.
3. The source file is decomposed into preprocessing tokens.
4. Macro invocations are expanded.

Phase 1 is defined by [lex.trigraph]p1. At this stage, your code is translated to const char *trigraph = RAW("(#)").

Phase 3 is defined by [lex.pptoken]. This is the stage where trigraphs are reverted in raw string literals. Paragraph 3 says:

If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted.

That is not the case in your example, therefore the trigraph is not reverted. Your code is transformed into the preprocessing-token sequence const char * trigraph = RAW ( "(#)" )

Finally, in phase 4, the RAW macro is expanded and the token-paste occurs, resulting in the following sequence of preprocessing-tokens: const char * trigraph = R"(#)". The r-char-sequence of the string literal comprises a #. Phase 3 has already occurred, and there is no other point at which reversion of trigraphs occurs.

like image 123
Richard Smith Avatar answered Sep 20 '22 13:09

Richard Smith