Separate compilation units vs Single Compilation unit for faster compilation, linking, and optimised code?

Question

There are several questions which talk about why we should have separate compilation units so improve compile times (for example, not including any code in the hpp file,but only in the cpp files).

But then I found this question:

#include all .cpp files into a single compilation unit?

If we can ignore the question of maintainability, if we can just look at compile / link times, as well as optimising code, what would be the benefits and pitfalls of having just one hpp and cpp file ?

Note that the post i linked to talks about a single cpp file (while there are many header files). I'm asking what happens if we just have one hpp file and one cpp file.....

EDIT: if we can ignore the fact that changing a single line, will cause the entire code to be recompiled, will it still be faster than if 1000's of separate files are recompiled from scratch...

EDIT: I am not interested in a discussion about maintainability. I'm trying to understand what makes a compiler compile faster. This question has nothing to do with what is practical, but more to do with just understanding a simple matter:

Will one large hpp & cpp file, compile faster than if the code was split across many hpp and cpp files, using a single core.

EDIT: I think people are getting sidetracked and talking about what is practical and what one SHOULD do. This question is not about what one SHOULD do - it is simply to help me understand what the compiler is doing under the hood - until now no one has answered that question, and instead is talking about whether it is practical or not.

EDIT: Besides the one person who actually tried to answer this question - I feel this question hasn't got the justice it deserved and is being unnecessarily down voted. SO is about sharing information, not punishing questions because people asking don't already know the answer.

Basile Starynkevitch · Accepted Answer

It is compiler-specific, and depends upon the optimizations you are asking from your compiler.

Most recent free software C++11 (or C++14) compilers are able to do link-time optimization : both recent GCC & Clang/LLVM are accepting the -flto flag (for link time optimiation...). To use it you should compile and link your code with it, and some additional (same) optimization flags. A typical use thru the make builder could be:

make 'CXX=g++ -flto -O2'

or, in separate commands:

g++ -flto -O2 -Wall -I/usr/local/include -c src1.cc
g++ -flto -O2 -Wall -I/usr/local/include -c src2.cc
g++ -flto -O2 -Wall src1.o src2.o -L/usr/local/lib -lsome -o binprog

^{Don't forget -flto -O2 at link time !}

Then the code is compiled nearly the same as if you put all src1.cc & src2.cc in the same compilation unit. In particular, the compiler is able to (and sometimes will) inline a call from a function in src1.cc to a function in src2.cc

What happens under the hoods with -flto (with GCC, but in principle it is similar in Clang) is that the compiler is putting some intermediate representation (in some Gimple/SSA form) of your source code in each object file. At "link-time" (actually done also by the compiler, not only the linker) this intermediate representation is reloaded and processed and recompiled for the entire program. So the compilation time nearly doubles.

So -flto is slowing the compilation (approximately by a factor of 2) and might sometimes give a few percents of performance improvement (execution time of the produced binary). Hence I almost never use it.

I'm trying to understand what makes a compiler compile faster.

This is compiler specific, and depends a lot with the optimizations you are asking from it. Using a recent GCC5 or GCC6, with g++ -O2 (and IIRC also with clang++ -O2) by practical and empirical measure the compilation time is proportional not only to the total size of the compilation unit (e.g. the number of tokens or size/volume of AST produced after preprocessing & include & macro expansions, and even template expansion) but also to the square of the size of the biggest function. A possible explanation is related to the time complexity of register allocation and instruction scheduling. Notice that the standard headers of the C++11 or C++14 containers are expanded to something quite big (e.g #include <vector> gives about ten thousand lines). BTW, compiling with g++ -O0 is faster than with g++ -O1 faster that g++ -O2. And asking for debug information (e.g. g++ -g2) slows down the compiler. So g++ -O1 -g2 is a slower compilation that g++ -O0 (which would produce a slower executable).

Precompiled headers might help reducing the compilation time (but not always!). You would have a single common header, and you'll better have not too small compilation units: total compilation time is slightly faster with 20 *.cc files of about two thousand lines each than with 200 *.cc files of two hundred lines each (notably because header files expand to many tokens). I generally recommend having at least a thousand lines per *.cc file if possible, so having just one small file of a hundred lines per class implementation is often a bad idea (in terms of overall compilation time). For a tiny project of e.g. 4KLOC having a single source file is quite sensible.

Notice also that C++ template expansion happens very "syntactically" (there are no modules yet in C++; Ocaml modules & functors are much better in that aspect). In other words vector<map<string,long>> is "expanded" (and is as compile-time consuming...) almost as if <vector> and <map & <string> standard headers have been inserted at the first occurrence of vector<map<string,long>> ... Template expansion is somehow an internal rewriting of ASTs. So a vector<map<string,set<long>>> requires -on its first occurrence- a lot of compiler work and nearly the same amount of work would have to be done for the "similar" vector<map<string,set<double>>>

Of course, several compilation units could be compiled in parallel, e.g. with make -j

To understand where a given GCC compilation is passing compiler time, pass -ftime-report to g++, see this. To be scared by the complexity of internal GCC representations, try -fdump-tree-all once.

To speedup your overall compilation time (with a focus on Linux systems with GCC; but you could adapt my answer to your system):

have a parallel build (e.g. make -j would run several g++ processes in parallel, perhaps one per translation unit e.g. per *.cc file). Learn to write good enough Makefile-s.
consider having one common header file and pre-compile your header (but that might slow down the compilation time, you need to benchmark); if you keep several header files (and there are many good reasons to do so), avoid having too many tiny header files and prefer having fewer, but bigger, ones. A single common precompiled header file of nearly ten thousand lines is not unusual (and you might #include several other files in it).
consider having larger source files, e.g. having 20 source files of 2000 lines each might compile faster than 200 source files of 200 lines each (because with many small source files, preprocessing & template expansion is more repeated) and I do sometimes have source files of nearly ten thousand lines. however you'll often do an incremental build (and then that could be false, YMMV and you need to benchmark)
disable optimizations, or lower the optimization level, so compile with g++ -O0 or g++ -O1 instead of g++ -O2 and avoid -flto. In many (but not all) cases, g++ -O3 -with or without -flto ...- is not worth the effort (it compiles slower, but the resulting machine code is not significantly faster than g++ -O2). But YMMV. Some numerical computations profit a lot from -O3. You could consider using function-specific pragmas or attributes to optimize some functions more than others in the same *.cc source file.
disable debugging info, or lower it, so compile with g++ -O1 instead of g++ -01 -g2; but higher debugging info (e.g. g++ -g3) is very useful to the gdb debugger so YMMV.
you could disable warnings, but that is not worth the trouble. On the contrary, always enable all of them, so at least -Wall to g++ and probably also -Wextra and be sure that your code compiles without warnings.
avoid using too much nested templates, like e.g. std::set<std::vector<std::map<std::string,long>>> ; in some cases having opaque pointers and using the PIMPL idiom could help. You might then only include some extra headers (e.g. for containers) in some *.cc and not all of them (but this is incompatible with precompiled headers, so YMMV).

Some compilers or versions are slightly faster than others. So you could prefer clang++ to g++. I do recommend using several compilers (with warnings enabled). Be scared of undefined behavior in your code.

Notice that C++ is unlike Java: you can and often should have several classes or functions per file. Again YMMV.

PS. See (the slides and documentations and follow the many links on) starynkevitch.net/Basile/gcc-melt for more about GCC internals. I have abandoned GCC MELT in 2018 but the slides are still useful.

Separate compilation units vs Single Compilation unit for faster compilation, linking, and optimised code?

Tags:

c++

Rahul Iyer

1 Answers

Basile Starynkevitch

Recent Activity

Donate For Us

Separate compilation units vs Single Compilation unit for faster compilation, linking, and optimised code?

Tags:

c++

Rahul Iyer

1 Answers

Basile Starynkevitch

Related questions

Recent Activity

Donate For Us