Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to run the C preprocessor several times on the same source?

From my experience, the C preprocessor just behaves as no-op when running on a previously preprocessed source. But is this behaviour guaranteed by the standard? Or maybe an implementation could have a preprocessor that modifies previously preprocessed code and for example removes/modifies line directives, or performs other modifications that could confuse the compiler?

like image 882
cesss Avatar asked Dec 31 '22 18:12

cesss


2 Answers

In general, preprocessing via cpp is not guaranteed to be idempotent (a noop after the first run). A simple counterexample:

#define X #define Y z
X
Y

The first invocation will yield:

 #define Y z
Y

The second one:

z

Having said that, valid C code shouldn't be doing something like that (because the output wouldn't be valid input for next stages of the compiler).

Moreover, depending on what you are trying to do, cpp has options like -fpreprocessed that may help.

like image 132
Acorn Avatar answered Jan 11 '23 23:01

Acorn


The standard does not define a "preprocessor" as a separate component. The closest it comes is in the description of phase 4 of the translation process in §5.1.1.2:

  1. Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.

However, the translation phases defined in that section are not separable, nor are they guaranteed to be independent of each other:

Implementations shall behave as if these separate phases occur, even though many are typically folded together in practice. Source files, translation units, and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation. (Footnote 6 from the same section.)

So there is no contemplated mechanism to extract the result of translation phases 1-4 in any form, much less as a text file -- in fact, if the translation phases were implemented precisely as described, the output of phase 4 would be a sequence of tokens -- and neither is there a mechanism to feed that output back into the translator.

In other words, you might have some tool which calls itself a preprocessor, and it might even be part of a compiler suite. But that tool's behaviour is outside of the scope of the C standard. So there are no guarantees at all from the standard.

By the way, if the token stream which comes out of phase 4 were naively converted to text, it might not correctly preserve token boundaries. Most preprocessor tools inject extra whitespace at points where this would otherwise occur. That allows the output of the tool to be fed into a compiler, at least in most cases. (See @acorn's answer for an example where this wouldn't work correctly.) But this behaviour is neither required nor regulated by the standard, either.

like image 44
rici Avatar answered Jan 12 '23 01:01

rici