Reading through the C++17 standard, it seems to me that there is an inconsistency between pp-number
as handled by the preprocessor and numeric literals, e.g. user-defined-integer-literal
, as they are defined to be handled by the "upper" language.
For example, the following is correctly parsed as a pp-number
according to the preprocessor grammar:
123_e+1
But placed in the context of a C++11-compliant code fragment,
int operator"" _e(unsigned long long)
{ return 0; }
int test()
{
return 123_e+1;
}
the current Clang or GCC compilers (I haven't tested others) will return an error similar to this:
unable to find numeric literal operator 'operator""_e+1'
where operator"" _e(...)
is not found and trying to define operator"" _e+1(...)
would be invalid.
It seems that this comes about because the compiler lexes the token as a pp-number
first, but then fails to roll-back and apply the grammar rules for a user-defined-integer-literal
when parsing the final expression.
In comparison, the following code compiles fine:
int operator"" _d(unsigned long long)
{ return 0; }
int test()
{
return 0x123_d+1; // doesn't lex as a 'pp-number' because 'sign' can only follow [eEpP]
}
Is this a correct reading of the standard? And if so, is it reasonable that the compiler should handle this, arguably rare, corner case?
You have fallen victim to the maximal munch rule which has the lexical analyzer take as many characters as possible to form a valid token.
This is covered in section [lex.pptoken]p3 which says (emphasis mine):
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name ([lex.header]) is only formed within a #include directive.
and includes several examples:
[ Example:
#define R "x" const char* s = R"y"; // ill-formed raw string, not "x" "y"
— end example ]
4 [ Example: The program fragment 0xe+foo is parsed as a preprocessing number token (one that is not a valid floating or integer literal token), even though a parse as three preprocessing tokens 0xe, +, and foo might produce a valid expression (for example, if foo were a macro defined as 1). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating literal token), whether or not E is a macro name. — end example ]
5[ Example: The program fragment x+++++y is parsed as x ++ ++ + y, which, if x and y have integral types, violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression. — end example ]
This rule effects in several other well known cases such as a+++++b and tokens >= which required a fix to allow.
For reference the pp-token grammar is as follows:
pp-number: digit . digit pp-number digit pp-number identifier-nondigit pp-number ' digit pp-number ' nondigit pp-number e sign pp-number E sign pp-number p sign pp-number P sign pp-number .
Note the e sign
production, which is what is snagging this case. If on the other hand you use d
like your second example you would not hit this (see it live on godbolt).
Also adding spacing would also fix your issue since you would no longer be subject to maximal munch (see it live on godbolt):
123_e + 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With