Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Macro expansion with unary minus

Consider the following code:

#define A -100

//later..
void Foo()
{
  int bar = -A;
  //etc..
}

Now, this compiles fine on some major compilers I tested (MSVC, GCC, Clang) and bar == 100 as expected, this is because the preprocessors of all those compilers insert a space between the tokens so you end up with:

int bar = - -100;

As I'd like my code to be as portable as possible I went to check if this behavior is defined by the standard but I can't find anything on it. Is this behavior guaranteed by the standard or is this just a compiler feature and is the naive approach (which wouldn't compile obviously) bar = --100; allowed too?

like image 945
Hatted Rooster Avatar asked May 20 '19 18:05

Hatted Rooster


2 Answers

This is specified in the language: the two - character will not end-up in being concatenated to form a -- operator.

This absence of concatenation is ensured by the way source files must be parsed: macro expansion is performed in translation phase 4. Before this translation phase, during translation phase 3, the source file must be transformed in a sequence of preprocessing tokens and white spaces [lex.phases]/3:

The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.13 Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified.

So after translation phase 3, the sequence of tokens near the definition of bar may look like:

// here {...,...,...} is used to list preprocessing tokens.
{int, ,bar, ,=, ,-,A,;}

Then after phase 4 you will get:

{int, ,bar, ,=, ,-,-, ,100,;}

Space are conceptually removed at phase 7:

{int,bar,=,-,-,100,;}
like image 165
Oliv Avatar answered Nov 17 '22 14:11

Oliv


Once the input is split into preprocessing tokens at early stages of translation, the only way to make two adjacent preprocessing tokens to merge into one token is ## operator of preprocessor. This is what ## operator is for. This is why it is necessary.

Once the preprocessing is complete, the compiler proper will analyze the code in terms of pre-parsed preprocessing tokens. The compiler proper will not attempt to merge two adjacent tokens into one token.

In your example the inner - and the outer - are two different preprocessing tokens. They will not merge into one -- token and they will not be seen by the compiler proper as one -- token.

For example

#define M1(a, b) a-b
#define M2(a, b) a##-b

int main()
{
  int i = 0;
  int x = M1(-, i); // interpreted as `int x = -(-i);`
  int y = M2(-, i); // interpreted as `int y = --i;` 
}

This is how the language specification defines the behavior.

In practical implementations the preprocessing stage and the compiling stage are usually decoupled from each other. And the output of preprocessing stage is typically represented in plain text form (not as some database of tokens). In such implementations the preprocessor and the compiler proper have to agree on some convention on how to separate adjacent ("touching") preprocessing tokens. Typically the preprocessor will insert an extra space between two separate tokens that happen to "touch" in source code.

The standard does say anything about that extra space, and formally it is not supposed to be there, but this is just how this separation is typically implemented in practice.

Note that since that space "is not supposed to be there", such implementations will also have to make some effort to ensure that this extra space is "undetectable" in other contexts. For example

#define M1(a, b) a-b
#define M2(a, b) a##-b

#define S_(x) #x
#define S(x) S_(x)

int main()
{
  std::cout << S(M1(-, i)) << std::endl; // outputs `--i`
  std::cout << S(M2(-, i)) << std::endl; // outputs `--i`
}

Both lines of main are supposed to output --i.

So, to answer your original question: yes, your code is portable in a sense that in a standard-compliant implementation those two - characters will never become a --. But the actual insertion of space is just an implementation detail. Some other implementation might use a different technique for preventing those - from merging into a --.

like image 5
AnT Avatar answered Nov 17 '22 14:11

AnT