Why does the C preprocessor in GCC interpret the word <code>linux</code> (small letters) as the constant <code>1</code>? test.c: <pre class="prettyprint"><code>#include <stdio.h> int main(void) { int linux = 5; return 0; } </code></pre> Result of <code>$ gcc -E test.c</code> (stop after the preprocessing stage): <pre class="prettyprint"><code>.... int main(void) { int 1 = 5; return 0; } </code></pre> Which of course yields an error. (BTW: There is no <code>#define linux</code> in the <code>stdio.h</code> file.)

In the Old Days (pre-ANSI), predefining symbols such as <code>unix</code> and <code>vax</code> was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of <code>#ifdef</code>s to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like <code>unix</code> and assumed that programmers would simply avoid using those names for their own purposes. The 1989 ANSI C standard introduced rules restricting what symbols an implementation could legally predefine. A macro predefined by the compiler could only have a name starting with two underscores, or with an underscore followed by an uppercase letter, leaving programmers free to use identifiers not matching that pattern and not used in the standard library. As a result, any compiler that predefines <code>unix</code> or <code>linux</code> is non-conforming, since it will fail to compile perfectly legal code that uses something like <code>int linux = 5;</code>. As it happens, gcc is non-conforming by default -- but it can be made to conform (reasonably well) with the right command-line options: <pre class="prettyprint"><code>gcc -std=c90 -pedantic ... # or -std=c89 or -ansi gcc -std=c99 -pedantic gcc -std=c11 -pedantic </code></pre> See the gcc manual for more details. gcc will be phasing out these definitions in future releases, so you shouldn't write code that depends on them. If your program needs to know whether it's being compiled for a Linux target or not it can check whether <code>__linux__</code> is defined (assuming you're using gcc or a compiler that's compatible with it). See the GNU C preprocessor manual for more information. A largely irrelevant aside: the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell) took advantage of the predefined <code>unix</code> macro: <pre class="prettyprint"><code>main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);} </code></pre> It prints <code>"unix"</code>, but for reasons that have absolutely nothing to do with the spelling of the macro name.

This appears to be an (undocumented) "GNU extension": [correction: I finally found a mention in the docs. See below.] The following command uses the <code>-dM</code> option to print all preprocessor defines; since the input "file" is empty, it shows exactly the predefined macros. It was run with gcc-4.7.3 on a standard ubuntu install. You can see that the preprocessor is standard-aware. In total, there 243 macros with <code>-std=gnu99</code> and 240 with <code>-std=c99</code>; I filtered the output for relevance. <pre class="prettyprint"><code>$ cpp --std=c89 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 $ cpp --std=gnu89 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 #define linux 1 $ cpp --std=c99 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 $ cpp --std=gnu99 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 #define linux 1 </code></pre> The "gnu standard" versions also <code>#define unix</code>. (Using <code>c11</code> and <code>gnu11</code> produces the same results.) I suppose they had their reasons, but it seems to me to make the default installation of gcc (which compiles C code with <code>-std=gnu89</code> unless otherwise specified) non-conformant, and -- as in this question -- surprising. Polluting the global namespace with macros whose names don't begin with an underscore is not permitted in a conformant implementation. (6.8.10p2: "Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore," but, as mentioned in Appendix J.5 (portability issues), such names are often predefined.) When I originally wrote this answer, I wasn't able to find any documentation in gcc about this issue, but I did finally discover it, not in C implementation-defined behaviour nor in C extensions but in the <code>cpp</code> manual section 3.7.3, where it notes that: <blockquote> We are slowly phasing out all predefined macros which are outside the reserved namespace. You should never use them in new programs… </blockquote>

Why does the C preprocessor interpret the word "linux" as the constant "1"?

Tags:

c

linux

c-preprocessor

gcc

Why does the C preprocessor in GCC interpret the word linux (small letters) as the constant 1?

test.c:

#include <stdio.h> int main(void) {            int linux = 5;     return 0; }

Result of $ gcc -E test.c (stop after the preprocessing stage):

.... int main(void) {     int 1 = 5;     return 0; }

Which of course yields an error.

(BTW: There is no #define linux in the stdio.h file.)

424

asked Oct 06 '13 16:10

ahmedaly50

2 Answers

In the Old Days (pre-ANSI), predefining symbols such as unix and vax was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of #ifdefs to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix and assumed that programmers would simply avoid using those names for their own purposes.

The 1989 ANSI C standard introduced rules restricting what symbols an implementation could legally predefine. A macro predefined by the compiler could only have a name starting with two underscores, or with an underscore followed by an uppercase letter, leaving programmers free to use identifiers not matching that pattern and not used in the standard library.

As a result, any compiler that predefines unix or linux is non-conforming, since it will fail to compile perfectly legal code that uses something like int linux = 5;.

As it happens, gcc is non-conforming by default -- but it can be made to conform (reasonably well) with the right command-line options:

gcc -std=c90 -pedantic ... # or -std=c89 or -ansi gcc -std=c99 -pedantic gcc -std=c11 -pedantic

See the gcc manual for more details.

gcc will be phasing out these definitions in future releases, so you shouldn't write code that depends on them. If your program needs to know whether it's being compiled for a Linux target or not it can check whether __linux__ is defined (assuming you're using gcc or a compiler that's compatible with it). See the GNU C preprocessor manual for more information.

A largely irrelevant aside: the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell) took advantage of the predefined unix macro:

main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}

It prints "unix", but for reasons that have absolutely nothing to do with the spelling of the macro name.

answered Sep 28 '22 04:09

Keith Thompson

This appears to be an (undocumented) "GNU extension": [correction: I finally found a mention in the docs. See below.]

The following command uses the -dM option to print all preprocessor defines; since the input "file" is empty, it shows exactly the predefined macros. It was run with gcc-4.7.3 on a standard ubuntu install. You can see that the preprocessor is standard-aware. In total, there 243 macros with -std=gnu99 and 240 with -std=c99; I filtered the output for relevance.

$ cpp --std=c89 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1  $ cpp --std=gnu89 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 #define linux 1  $ cpp --std=c99 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1  $ cpp --std=gnu99 -dM < /dev/null | grep linux #define __linux 1 #define __linux__ 1 #define __gnu_linux__ 1 #define linux 1

The "gnu standard" versions also #define unix. (Using c11 and gnu11 produces the same results.)

I suppose they had their reasons, but it seems to me to make the default installation of gcc (which compiles C code with -std=gnu89 unless otherwise specified) non-conformant, and -- as in this question -- surprising. Polluting the global namespace with macros whose names don't begin with an underscore is not permitted in a conformant implementation. (6.8.10p2: "Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore," but, as mentioned in Appendix J.5 (portability issues), such names are often predefined.)

When I originally wrote this answer, I wasn't able to find any documentation in gcc about this issue, but I did finally discover it, not in C implementation-defined behaviour nor in C extensions but in the cpp manual section 3.7.3, where it notes that:

We are slowly phasing out all predefined macros which are outside the reserved namespace. You should never use them in new programs…

answered Sep 28 '22 04:09

rici

Related questions
                            
                                Can code that is valid in both C and C++ produce different behavior when compiled in each language?
                            
                                Divide a number by 3 without using *, /, +, -, % operators
                            
                                How can I get the list of files in a directory using C or C++?
                            
                                Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell
                            
                                What is size_t in C?
                            
                                How do you pass a function as a parameter in C?
                            
                                What is a segmentation fault?
                            
                                Why isn't sizeof for a struct equal to the sum of sizeof of each member?
                            
                                What should main() return in C and C++?
                            
                                Using boolean values in C
                            
                                Difference between malloc and calloc?
                            
                                Why use apparently meaningless do-while and if-else statements in macros?
                            
                                Why are these constructs using pre and post-increment undefined behavior?
                            
                                What is the strict aliasing rule?
                            
                                Unit Testing C Code [closed]
                            
                                typedef struct vs struct definitions [duplicate]
                            
                                What is the difference between a definition and a declaration?
                            
                                Obfuscated C Code Contest 2006. Please explain sykes2.c
                            
                                What is the difference between ++i and i++?
                            
                                How to initialize all members of an array to the same value?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With