Why does the C preprocessor in GCC interpret the word linux
(small letters) as the constant 1
?
test.c:
#include <stdio.h> int main(void) { int linux = 5; return 0; }
Result of $ gcc -E test.c
(stop after the preprocessing stage):
.... int main(void) { int 1 = 5; return 0; }
Which of course yields an error.
(BTW: There is no #define linux
in the stdio.h
file.)
The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. In many C implementations, it is a separate program invoked by the compiler as the first part of translation.
The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs.
The double-number-sign or token-pasting operator (##), which is sometimes called the merging or combining operator, is used in both object-like and function-like macros. It permits separate tokens to be joined into a single token, and therefore, can't be the first or last token in the macro definition.
## is Token Pasting Operator. The double-number-sign or "token-pasting" operator (##), which is sometimes called the "merging" operator, is used in both object-like and function-like macros.
In the Old Days (pre-ANSI), predefining symbols such as unix
and vax
was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of #ifdef
s to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix
and assumed that programmers would simply avoid using those names for their own purposes.
The 1989 ANSI C standard introduced rules restricting what symbols an implementation could legally predefine. A macro predefined by the compiler could only have a name starting with two underscores, or with an underscore followed by an uppercase letter, leaving programmers free to use identifiers not matching that pattern and not used in the standard library.
As a result, any compiler that predefines unix
or linux
is non-conforming, since it will fail to compile perfectly legal code that uses something like int linux = 5;
.
As it happens, gcc is non-conforming by default -- but it can be made to conform (reasonably well) with the right command-line options:
gcc -std=c90 -pedantic ... # or -std=c89 or -ansi gcc -std=c99 -pedantic gcc -std=c11 -pedantic
See the gcc manual for more details.
gcc will be phasing out these definitions in future releases, so you shouldn't write code that depends on them. If your program needs to know whether it's being compiled for a Linux target or not it can check whether __linux__
is defined (assuming you're using gcc or a compiler that's compatible with it). See the GNU C preprocessor manual for more information.
A largely irrelevant aside: the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell) took advantage of the predefined unix
macro:
main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
It prints "unix"
, but for reasons that have absolutely nothing to do with the spelling of the macro name.
This appears to be an (undocumented) "GNU extension": [correction: I finally found a mention in the docs. See below.]
The following command uses the -dM
option to print all preprocessor defines; since the input "file" is empty, it shows exactly the predefined macros. It was run with gcc-4.7.3 on a standard ubuntu install. You can see that the preprocessor is standard-aware. In total, there 243 macros with -std=gnu99
and 240 with -std=c99
; I filtered the output for relevance.
$ cpp --std=c89 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 $ cpp --std=gnu89 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 #define linux 1 $ cpp --std=c99 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 $ cpp --std=gnu99 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 #define linux 1
The "gnu standard" versions also #define unix
. (Using c11
and gnu11
produces the same results.)
I suppose they had their reasons, but it seems to me to make the default installation of gcc (which compiles C code with -std=gnu89
unless otherwise specified) non-conformant, and -- as in this question -- surprising. Polluting the global namespace with macros whose names don't begin with an underscore is not permitted in a conformant implementation. (6.8.10p2: "Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore," but, as mentioned in Appendix J.5 (portability issues), such names are often predefined.)
When I originally wrote this answer, I wasn't able to find any documentation in gcc about this issue, but I did finally discover it, not in C implementation-defined behaviour nor in C extensions but in the cpp
manual section 3.7.3, where it notes that:
We are slowly phasing out all predefined macros which are outside the reserved namespace. You should never use them in new programs…
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With