Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is a translation unit in C

The commonly used definition of a translation unit is what comes after preprocessing (header files inclusions, macros, etc along with the source file). This definition is reasonably clear and the C standard, 5.1.1.1, C11, says:

A C program need not all be translated at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this International Standard. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit.

Reading the first sentence more closely:

A C program need not all be translated at the same time.

which implies (to my reading), a C program can be translated at the same without necessarily splitting them into multiple preprocessing source files. Also at the end of the same paragraph, the standard says:

Translation units may be separately translated and then later linked to produce an executable program.

which can be (and typically is) interpreted as compiling individual object files and then finally linking them to produce a single executable program. However, if one can make a question out of the above statement and ask: does it mean an implementation is free to consider multiple source files as a single translation unit, especially for an invocation like:

gcc file1.c file2.c -o out

where the compiler has access to the entire source?

In particular, if an implementation treats file1.c + file2.c (above) as a single translation unit, can it be considered non-conforming?

like image 251
P.P Avatar asked Feb 16 '17 00:02

P.P


People also ask

What do you mean by translation unit in C++?

A translation unit is the basic unit of compilation in C++. This unit is made up of the contents of a single source file after it passes through preprocessing. It contains included any header files without blocks that are ignored using conditional preprocessing statements like ifdef, ifndef, etc.

Is a header file a translation unit?

No, headers are not separate translation units.

Why the unit of translation is important?

A translation unit is the smallest portion of a sentence whose words cannot be translated separately without resulting in a mistranslation or nonsense. These can encompass entire sentences and even entire messages whenever cultural factors intervene to prevent any sort of literal translation.

What is a translation unit in Java?

Translation unit. A translation unit is the smallest unit of code that can be compiled separately.


4 Answers

In the second line you quoted:

The text of the program is kept in units called source files, (or preprocessing files) in this International Standard

If there are two source files then there are two preprocessing files, and therefore two preprocessing translation units, and therefore two translation units. One corresponding to each source file.

The standard doesn't define source file. I guess the compiler could say "I'm making up my own version of 'source file' by declaring that file1.c and file2.c are not source files after all!" and concatenate them, but this would be at odds with programmer expectations. I think you would have a hard time arguing that file1.c is not a source file.

like image 94
M.M Avatar answered Oct 27 '22 17:10

M.M


However, if one can makes a question out of the above statement and ask: does it mean an implementation is free to consider multiple source files as a single translation unit

No. The definition is clear:

A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit.

A translation unit is the result of preprocessing one source file and its includes. The fact that you might translate two translation units at the same time doesn't mean you can treat them as one translation unit.

like image 33
user2357112 supports Monica Avatar answered Oct 27 '22 18:10

user2357112 supports Monica


Compilers are free to translate several source files at the same time, but they cannot change their semantics.

Translating several files together will likely be somewhat faster (because the compiler starts only once) and will permit better whole program optimization: Source code of called functions in other translation units is then available at the point of call from other translation units. The compiler can inspect the called code and use the information, much as it can with a single translation unit. From the gcc 6.3.0 manual:

The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.

Called functions can be inspected for absence of aliasing, factual const-ness of pointed-to objects etc., enabling the compiler to perform optimizations which would be wrong in the general case.

And, of course, such functions can be inlined.

But there are semantics of (preprocessing) translation units (which correspond to source files after preprocessing, per your standard quote) which the compiler must respect. @Malcolm mentioned one, file-static variables. My gut feeling is that there may be other, more subtle issues concerning declarations and declaration order.

Another obvious source code scope issue concerns defines. From the n1570 draft, 6.10.3.5:

A macro definition lasts (independent of block structure) until a corresponding #undef directive is encountered or (if none is encountered) until the end of the preprocessing translation unit.

Both issues forbid simple C source file concatenation; the compiler must additionally apply some rudimentary logic.

like image 6
Peter - Reinstate Monica Avatar answered Oct 27 '22 18:10

Peter - Reinstate Monica


A translation unit means a dot C file. To all intents and purposes, including its associated dot h includes. Rarely #include directives are used to add other file types or other dot C files.

static variables are visible only within the translation unit. It's very common to have a few public functions with external linkage and many static functions and data items t support. So a C translation unit is a bit like a singleton C++ class. If the compiler doesn't handle static correctly it is non-conforming.

Typically one object file is created for each translation unit, and they are then linked by the linker. That's not actually mandated by the standard but is the natural and obvious way to do things in an environment where files are cheap to create and compiling is relatively slow.

like image 5
Malcolm McLean Avatar answered Oct 27 '22 18:10

Malcolm McLean