From my experience, the C preprocessor just behaves as no-op when running on a previously preprocessed source. But is this behaviour guaranteed by the standard? Or maybe an implementation could have a preprocessor that modifies previously preprocessed code and for example removes/modifies line directives, or performs other modifications that could confuse the compiler?
In general, preprocessing via cpp
is not guaranteed to be idempotent (a noop after the first run). A simple counterexample:
#define X #define Y z
X
Y
The first invocation will yield:
#define Y z
Y
The second one:
z
Having said that, valid C code shouldn't be doing something like that (because the output wouldn't be valid input for next stages of the compiler).
Moreover, depending on what you are trying to do, cpp
has options like -fpreprocessed
that may help.
The standard does not define a "preprocessor" as a separate component. The closest it comes is in the description of phase 4 of the translation process in §5.1.1.2:
- Preprocessing directives are executed, macro invocations are expanded, and
_Pragma
unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A#include
preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
However, the translation phases defined in that section are not separable, nor are they guaranteed to be independent of each other:
Implementations shall behave as if these separate phases occur, even though many are typically folded together in practice. Source files, translation units, and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation. (Footnote 6 from the same section.)
So there is no contemplated mechanism to extract the result of translation phases 1-4 in any form, much less as a text file -- in fact, if the translation phases were implemented precisely as described, the output of phase 4 would be a sequence of tokens -- and neither is there a mechanism to feed that output back into the translator.
In other words, you might have some tool which calls itself a preprocessor, and it might even be part of a compiler suite. But that tool's behaviour is outside of the scope of the C standard. So there are no guarantees at all from the standard.
By the way, if the token stream which comes out of phase 4 were naively converted to text, it might not correctly preserve token boundaries. Most preprocessor tools inject extra whitespace at points where this would otherwise occur. That allows the output of the tool to be fed into a compiler, at least in most cases. (See @acorn's answer for an example where this wouldn't work correctly.) But this behaviour is neither required nor regulated by the standard, either.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With