I just found out that gcc seems to treat the result of the expansion of a function-like macro as a separate token. Here is a simple example showing the behavior of gcc:
#define f() foo
void f()_bar(void);
void f()bar(void);
void f()-bar(void);
When I execute gcc -E -P test.c
(running just the preprocessor), I get the following output:
void foo _bar(void);
void foo bar(void);
void foo-bar(void);
It seems like, in the first two definitions, gcc inserts space after the expanded macro to ensure it is a separate token. Is that really what is happening here?
Is this mandated by any standard (I couldn't find documentation on the topic)?
I want to make _bar
part of the same token. Is there any way to do this? I could use the token concatenation operator ##
but it will require several levels of macros (since in the real code f() is more complex). I was wondering if there is a simple (and probably more readable) solution.
It seems like, in the first two definitions, gcc inserts space after the expanded macro to ensure it is a separate token. Is that really what is happening here?
Yes.
Is this mandated by any standard (I couldn't find documentation on the topic)?
Yes, although an implementation would be allowed to insert even more than one whitespace to separate the tokens.
f()_bar
here you have 4 tokens after lexical analysis (they are actually pre-processor tokens at this stage but let's call them tokens): f
, (
, )
and _bar
.
The function-like macro replacement semantic (as defined in C11, 6.10.3) has to replace the 3 token f
, (
, )
into a new one foo
. It is not allowed to work on other tokens and change the last _bar
token. For this the implementation has to insert at least one whitespace to preserve _bar
token. Otherwise the result would have been foo_bar
which is a single token.
gcc
preprocessor somewhat documents it here:
Once the input file is broken into tokens, the token boundaries never change, except when the ‘##’ preprocessing operator is used to paste tokens together. See Concatenation. For example,
#define foo() bar foo()baz ==> bar baz not ==> barbaz
In the other case, like f()-bar
, there 5 tokens: f
, (
, )
, -
and bar
. (-
is a punctuator token in C whereas _
in _bar
is simply a character of the identifier token). The implementation does not have to insert token separator (as whitespace) here as after macro replacement -bar
are still considered as two separate tokens from C syntax.
gcc
preprocessor (cpp
) does not insert whitespace here simply because it does not have to. In cpp
documentation, on token spacing it is written (on a different issue):
However, we would like to keep space insertion to a minimum, both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles.
I didn't address the solution to your issue in this answer, but I think you have to use operator explicitly specified to concatenate tokens: the ##
token pasting operator.
The only way I can think of (if you can not use the token concatenation operator ##) is using the traditional (pre-standard) C preprocessing:
gcc -E -P -traditional-cpp test.c
Output:
void foo_bar(void);
void foobar(void);
void foo-bar(void);
More info
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With