Even though an oldtimer, I fear I do not (anymore) have a complete grasp of parsing of constants in C. The second of the following 1-liners fails to compile:
int main( void ) { return (0xe +2); }
int main( void ) { return (0xe+2); }
$ gcc -s weird.c
weird.c: In function ‘main’:
weird.c:1:28: error: invalid suffix "+2" on integer constant
int main( void ) { return (0xe+2); }
^
The reason for the compilation failure is probably that 0xe+2 is parsed as a hexadecimal floating point constant as per C11 standard clause 6.4.4.2. My question is whether a convention exists to write simple additions of hexadecimal and decimal numbers in C, I do not like to have to rely on white space in parsing.
This was with gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9). Stopping compiling after preprocessing (-E) show that the compilation failure happens in gcc not cpp.
Because GCC thinks that 0xe+2
is a floating point number, while this is just an addition of two integers.
According to cppreference:
Due to maximal munch, hexadecimal integer constants ending in
e
andE
, when followed by the operators+
or-
, must be separated from the operator with whitespace or parentheses in the source:int x = 0xE+2; // error int y = 0xa+2; // OK int z = 0xE +2; // OK int q = (0xE)+2; // OK
My question is whether a convention exists to write simple additions of hexadecimal and decimal numbers in C
The convention is to use spaces. This is actually mandated by C11 6.4 §3:
Preprocessing tokens can be separated by white space; this consists of comments (described later), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both.
Where plain space is the commonly used one.
Similar exotic issues exist here and there in the language, some examples:
---a
must be rewritten as - --a
.a+++++b
must be rewritten as a++ + ++b
.a /// comment
b;
a / // comment
b
And so on. The culprit in all of these cases is the token parser which follows the so-called "maximal munch rule", C11 6.4 §4:
If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token.
In this specific case, the pre-processor does not make any distinction between floating point constants and integer constants, when it builds up a pre-processing token called pp-number, defined in C11 6.4.8:
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .A preprocessing number begins with a digit optionally preceded by a period (.) and may be followed by valid identifier characters and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.
Here, pp-number does apparently not have to be a floating point constant, as far as the pre-processor is concerned.
( As a side note, a similar convention also exists when terminating hexadecimal escape sequences inside strings. If I for example want to print the string "ABBA"
on a new line, then I can't write
puts("\xD\xABBA");
(CR+LF+string)
Because the string in this case could be interpreted as part of the hex escape sequence. Instead I have to use white space to end the escape sequence and then rely on pre-processor string concatenation: puts("\xD\xA" "BBA")
. The purpose is the same, to guide the pre-processor how to parse the code. )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With