Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is MSVC preprocessor concatenating tokens differently than GCC and Clang?

Recently, I came across a problem with MSVC. Here is a minimal example of it.

#define NUMBERSIGNS(a,b) a##b
#define CONCAT(a,b) NUMBERSIGNS(a,b) 
#define AA
#define BB  
CONCAT(B, CONCAT(A, A B))

What I am thinking:

Since arguments preceded or followed by ## will not be expanded, I need a NUMBERSIGNS(a,b) macro to wrap up the ##, and call it with CONCAT(a,b), so the arguments get expanded before they are concatenated.

When CONCAT(B, CONCAT(A, A B)) gets expanded, I expect the inner CONCAT(A, A B) gets expanded to AA B, yielding CONCAT(B, AA B).

Then we expand AA to and we get CONCAT(B, B) (I guess MSVC did not do this step, and I don't know if it should).

Then we have BB which is rescanned and expanded to .

Preprocessed by gcc and clang the code yields empty, which is my desired result:


while MSVC gives:

BAA B

Is this a bug of MSVC or am I writing an undefined behavior?

EDIT:

Thanks to the answers, where the problem was has been identified. MSVC did not conform to the Standard.

However, recently it seems they started to take the Standard seriously and added a new /Zc:preprocessor option to enable a full conforming mode of their C/C++ preprocessor. See: Announcing full support for a C/C++ conformant preprocessor in MSVC

like image 357
blingblingbling Avatar asked Jun 13 '20 17:06

blingblingbling


People also ask

What is ## in preprocessor?

This is called token pasting or token concatenation. The ' ## ' preprocessing operator performs token pasting. When a macro is expanded, the two tokens on either side of each ' ## ' operator are combined into a single token, which then replaces the ' ## ' and the two original tokens in the macro expansion.

What is ## operator in C++?

The double-number-sign or token-pasting operator (##), which is sometimes called the merging or combining operator, is used in both object-like and function-like macros. It permits separate tokens to be joined into a single token, and therefore, can't be the first or last token in the macro definition.


1 Answers

C 2018 6.10.3.1 1 specifies macro argument substitution:

After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list, unless preceded by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available.

In CONCAT ( B , CONCAT ( A , A B ) ), the first CONCAT macro has arguments B and CONCAT ( A , A B ). These arguments are completely macro replaced first.

B is not a macro, so it remains B.

In CONCAT ( A , A B ), the arguments A and A B are completely macro replaced, but they are not macros, so they remain A and A B.

Then CONCAT ( A , A B ) is replaced by NUMBERSIGNS ( A , A B ).

Then 6.10.3.4 1 tells us:

After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.

So NUMBERSIGNS ( A , A B ) is replaced by A ## A B. Then the tokens before and after the ## are concatenated, forming AA B (per 6.10.3.3 3).

This sequence AA B is then again rescanned, per 6.10.3.4 1. Since AA is a macro, it is replaced with no tokens, leaving just B. This completes the expansion of the second argument of the first CONCAT.

Thus, after argument substitution, we have CONCAT ( B , B ).

Now CONCAT is replaced, forming NUMBERSIGNS ( B , B ).

Since NUMBERSIGNS is a macro, this is replaced by B ## B. Then the tokens before and after ## are concatenated, forming BB.

This is rescanned, and BB is replaced with no tokens.

The final result is no tokens. GCC is correct, and MSVC’s result does not conform to the C standard.

like image 193
Eric Postpischil Avatar answered Oct 19 '22 04:10

Eric Postpischil