If we have two .c files and a .h file: main.c sub.c sub.h
, where
main.c
#include "sub.h"
...
sub.c
#include "sub.h"
...
we can compile the program with, either i)
gcc -o a.out main.c sub.c
or ii)
gcc -c main.c
gcc -c sub.c
gcc -o a.out main.o sub.o
Given this case, does preprocessor output one or two translation unit(s)?
I am confused because: main.c
includes sub.h
, meaning preprocessor would output one compilation unit. On the other hand, there are two object files created, main.o
and sub.o
, before creating executable, making me to think that "two source files thus two translation units."
Which part am I misunderstanding? or where am I making mistakes?
Consider the generation of an executable as a two step process: First, each translation unit is compiled to an object file; let's call this the compiler. Second, the object files are linked together to an executable program; let's call this the linker.
"Translation unit" is a matter of the first step. A translation unit is each file where compilation starts (i.e. which is passed to the compiler). In most IDEs, there are rules that declare that each file with extension .c
or .cpp
is passed as input to the compiler, whereas other files are not. So files with the extension .h
, .hpp
, .txt
are typically not passed to the compiler directly.
In your example, main.c
and sub.c
are probably translation units, whereas sub.h
is no translation unit by itself (it is only "included" in other translation units and considered in the course of their compilation).
So you get two object files, one for each translation unit. These two object files are then considered by the linker.
Note that even a .h
file might contain a complete program; but unless you configure your environment that this .h
-file is compiled on its own, it won't generate an object file.
Here's what the C standard has to say about that:
A source file together with all the headers and source files included via the preprocessing directive
#include
is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. [..] Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.
(Source: C99 draft standard, 5.1.1.1 §1)
So in both of your cases you have two translation units. One of them comes from the compiler preprocessing main.c
and everything that is included through #include
directives—that is, sub.h
and probably <stdio.h>
and other headers. The second comes from the compiler doing the same thing with sub.c
.
The difference from your first to your second example is that in the latter you are explicitly storing the "different translated translation units" as object files.
Notice that there is no rule associating one object file with any number of translation units. The GNU linker is one example of linker that is capable of joining two .o
files together.
The standard, as far as I know, does not specify the extension of source files. Notwithstanding, in practical aspects you are free to #include
a .c
file into other, or placing your entire program in a .h
file. With gcc
you can use the option -x c
to force a .h
file to be treated as the starting point of a translation unit.
The distinction made here:
A source file together with all the headers and source files included via the preprocessing directive
#include
[...]
is because a header need not be a source file. Similarly, the contents of <...>
in an #include
directive need not be a valid file name. How exactly the compiler uses the named headers <...>
and "..."
is implementation-defined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With