I have a tiny problem with a preprocessor that puzzles me and I cannot find any explanation to it in the documentation/preprocessor/language spec.
#define booboo() aaa
booboo()bbb
booboo().bbb
is preprocessed into:
aaa bbb   <--- why is space added here
aaa.bbb
After handling trigraphs, continued lines and comments, preprocessor works on preprocessor directives and divides input into preprocessing tokens and whitespace. booboo's replacement list comprises one pp-token which is identifier 'aaa'. booboo()bbb is divided into pp-tokens: 'booboo', '(', ')', 'bbb'. Sequence of 'booboo', '(', ')' is recognised as functional macro invocation and it should be expanded to 'aaa' and imho in output should look like 'aaabbb'. I said look like since - to human - it would look like one token whereas compiler would get 2 tokens 'aaa' and 'bbb' since no '##' operator was used that allows pp-token concatenation. Why/what rule makes cpp (c preprocessor) place additional space between 'aaa' and 'bbb' when 'booboo().bbb' results in 'aaa.bbb' without space?
Is this because cpp tries to make output (which is for humans mostly) unambinuous? Human is not able to tell that 'aaabbb' is composed from 2 tokens as it sees token's spelling only. Am I right? I've read C99 documentation about preprocessor and gcc's documentation for cpp. I see nothing about it.
If I am right we have similar situation here:
#define baba() +
baba()+
baba()-
results in:
+ +
+-
Otherwise (if '++' is the output) it would look to a human like '++' token but there would be 2 tokens '+' and '+'. Is it like with '##' operator that cpp checks if concatenation produces valid token but in shown cases wants to prevent human that concatenation was performed? '+-' is not ambiguous hence no space added
The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. In many C implementations, it is a separate program invoked by the compiler as the first part of translation.
The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs.
The C++ compiler generally ignores whitespace, with a few minor exceptions (when processing text literals). For this reason, we say that C++ is a whitespace-independent language.
We can consider a preprocessor as a compilation process, which runs when the developer runs the program. It is a pre-process of execution of a program using c/c++ language. To initialize a process of preprocessor commands, it's mandated to define with a hash symbol (#).
The result of preprocessing is to transform the source file into a list of tokens. In your case the list of tokens would look like, after tokenization:
....
booboo()
bbb
....
and then after macro replacement:
....
aaa
bbb
....
Then the compiler translates the list of tokens into an executable.
The whitespace you are seeing is just an implementation detail that your compiler etc. has chosen to lay out the preprocessing tokens when displaying an intermediate result to you. The standards say nothing about any intermediate processing files. It is not required that there be a separate program to do preprocessing either.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With