Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean that the language of preprocessor directives is weakly related to the grammar of C?

The Wikipedia article on the C Preprocessor says:

The language of preprocessor directives is only weakly related to the grammar of C, and so is sometimes used to process other kinds of text files.

How is the language of a preprocessor different from C grammar? What are the advantages? Has the C Preprocessor been used for other languages/purposes?

Can it be used to differentiate between inline functions and macros, since inline functions have the syntax of a normal C function whereas macros use slightly different grammar?

like image 915
Neeraj Lohia Avatar asked Jul 27 '17 07:07

Neeraj Lohia


2 Answers

The Wikipedia article is not really an authoritative source for the C programming language. The C preprocessor grammar is a part of the C grammar. However it is completely distinct from the phrase structure grammar i.e. these 2 are not related at all, except that they both understand that the input consists of C language tokens, (though the C preprocessor has the concept of preprocessing numbers, which means that something like 123_abc is a legal preprocessing token, but it is not a valid identifier).

After the preprocessing has been completed and before the translation using the phrase structure grammar commences (the preprocessor directives have by now been removed, and macros expanded and so forth),

Each preprocessing token is converted into a token. (C11 5.1.1.2p1 item 7)


The use of C preprocessor for any other languages is really abuse. The reason is that the preprocessor requires that the file consists of proper C preprocessing tokens. It isn't designed to work for any other languages. Even C++, with its recent extensions, such as raw string literals, cannot be preprocessed by a C preprocessor!

Here's an excerpt from the cpp (GNU C preprocessor) manuals:

The C preprocessor is intended to be used only with C, C++, and Objective-C source code. In the past, it has been abused as a general text processor. It will choke on input which does not obey C's lexical rules. For example, apostrophes will be interpreted as the beginning of character constants, and cause errors. Also, you cannot rely on it preserving characteristics of the input which are not significant to C-family languages. If a Makefile is preprocessed, all the hard tabs will be removed, and the Makefile will not work.

like image 145

The preprocessor creates preprocessing tokens, which later are converted in C-tokens.

In general the conversion is quite direct, but not always. For example, if you have a conditional preprocessing directive that evaluates to false as in

#if 0
   comments
#endif

then in comments you can write whatever you want, it will be converted in preprocessing tokens that will never be converted in C-tokens, so like this inside a C source file you can insert non-commented code.

The only link between the language of the preprocessor and C is that many tokens are defined almost the same but not always.

for example, it is valid to have preprocessor numbers (in ISO9899 standard called pp-numbers) like 4MD which are valid preprocessor numbers but not valid C numbers. Using the ## operator you can get a valid C identifier using these preprocessing numbers. For example

#define version 4A
#define name TEST_
#define VERSION(x, y) x##y
VERSION(name, version) <= this will be valid C identifier

The preprocessor was conceived such that to be applicable to any language to make text translation, not having C in mind. In C it is useful mainly to make a clear separation between interfaces and implementations.

like image 2
alinsoar Avatar answered Oct 16 '22 09:10

alinsoar