Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conventions to write simple additions of hexadecimal and decimal numbers

Tags:

c

parsing

gcc

c99

Even though an oldtimer, I fear I do not (anymore) have a complete grasp of parsing of constants in C. The second of the following 1-liners fails to compile:

int main( void ) { return (0xe +2); }
int main( void ) { return (0xe+2); }

$ gcc -s weird.c

weird.c: In function ‘main’:
weird.c:1:28: error: invalid suffix "+2" on integer constant
int main( void ) { return (0xe+2); }
                           ^

The reason for the compilation failure is probably that 0xe+2 is parsed as a hexadecimal floating point constant as per C11 standard clause 6.4.4.2. My question is whether a convention exists to write simple additions of hexadecimal and decimal numbers in C, I do not like to have to rely on white space in parsing.

This was with gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9). Stopping compiling after preprocessing (-E) show that the compilation failure happens in gcc not cpp.

like image 512
Baard Avatar asked Apr 11 '18 06:04

Baard


2 Answers

Because GCC thinks that 0xe+2 is a floating point number, while this is just an addition of two integers.

According to cppreference:

Due to maximal munch, hexadecimal integer constants ending in e and E, when followed by the operators + or -, must be separated from the operator with whitespace or parentheses in the source:

int x = 0xE+2;   // error
int y = 0xa+2;   // OK
int z = 0xE +2;  // OK
int q = (0xE)+2; // OK
like image 80
msc Avatar answered Nov 01 '22 08:11

msc


My question is whether a convention exists to write simple additions of hexadecimal and decimal numbers in C

The convention is to use spaces. This is actually mandated by C11 6.4 §3:

Preprocessing tokens can be separated by white space; this consists of comments (described later), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both.

Where plain space is the commonly used one.

Similar exotic issues exist here and there in the language, some examples:

  • ---a must be rewritten as - --a.
  • a+++++b must be rewritten as a++ + ++b.
  • a /// comment
    b;
    must be rewritten as
    a / // comment
    b

And so on. The culprit in all of these cases is the token parser which follows the so-called "maximal munch rule", C11 6.4 §4:

If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token.

In this specific case, the pre-processor does not make any distinction between floating point constants and integer constants, when it builds up a pre-processing token called pp-number, defined in C11 6.4.8:

pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

A preprocessing number begins with a digit optionally preceded by a period (.) and may be followed by valid identifier characters and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.

Here, pp-number does apparently not have to be a floating point constant, as far as the pre-processor is concerned.


( As a side note, a similar convention also exists when terminating hexadecimal escape sequences inside strings. If I for example want to print the string "ABBA" on a new line, then I can't write

puts("\xD\xABBA"); (CR+LF+string)

Because the string in this case could be interpreted as part of the hex escape sequence. Instead I have to use white space to end the escape sequence and then rely on pre-processor string concatenation: puts("\xD\xA" "BBA"). The purpose is the same, to guide the pre-processor how to parse the code. )

like image 28
Lundin Avatar answered Nov 01 '22 10:11

Lundin