What techniques can be used to speed up C++ compilation times?
This question came up in some comments to Stack Overflow question C++ programming style, and I'm interested to hear what ideas there are.
I've seen a related question, Why does C++ compilation take so long?, but that doesn't provide many solutions.
Moving your less frequently modified code into libraries can reduce compile time. By using shared libraries ( . so or . dll ), you can reduce linking time as well.
The Zapcc is the fastest compiler in our compile test.
Header files This is probably the main reason, as it requires huge amounts of code to be compiled for every compilation unit, and additionally, every header has to be compiled multiple times (once for every compilation unit that includes it).
Go is very simple, lean, and minimalist: For example, there are only 25 keywords in Golang. Being simple and easy to learn was also one of the design choices while creating Golang. There are very little clutter and complexities. However, this also helped in faster compilation time.
Take a look at the Pimpl idiom here, and here, also known as an opaque pointer or handle classes. Not only does it speed up compilation, it also increases exception safety when combined with a non-throwing swap function. The Pimpl idiom lets you reduce the dependencies between headers and reduces the amount of recompilation that needs to be done.
Wherever possible, use forward declarations. If the compiler only needs to know that SomeIdentifier
is a struct or a pointer or whatever, don't include the entire definition, forcing the compiler to do more work than it needs to. This can have a cascading effect, making this way slower than they need to be.
The I/O streams are particularly known for slowing down builds. If you need them in a header file, try #including <iosfwd>
instead of <iostream>
and #include the <iostream>
header in the implementation file only. The <iosfwd>
header holds forward declarations only. Unfortunately the other standard headers don't have a respective declarations header.
Prefer pass-by-reference to pass-by-value in function signatures. This will eliminate the need to #include the respective type definitions in the header file and you will only need to forward-declare the type. Of course, prefer const references to non-const references to avoid obscure bugs, but this is an issue for another question.
Use guard conditions to keep header files from being included more than once in a single translation unit.
#pragma once #ifndef filename_h #define filename_h // Header declarations / definitions #endif
By using both the pragma and the ifndef, you get the portability of the plain macro solution, as well as the compilation speed optimization that some compilers can do in the presence of the pragma once
directive.
The more modular and less interdependent your code design is in general, the less often you will have to recompile everything. You can also end up reducing the amount of work the compiler has to do on any individual block at the same time, by virtue of the fact that it has less to keep track of.
These are used to compile a common section of included headers once for many translation units. The compiler compiles it once, and saves its internal state. That state can then be loaded quickly to get a head start in compiling another file with that same set of headers.
Be careful that you only include rarely changed stuff in the precompiled headers, or you could end up doing full rebuilds more often than necessary. This is a good place for STL headers and other library include files.
ccache is another utility that takes advantage of caching techniques to speed things up.
Many compilers / IDEs support using multiple cores/CPUs to do compilation simultaneously. In GNU Make (usually used with GCC), use the -j [N]
option. In Visual Studio, there's an option under preferences to allow it to build multiple projects in parallel. You can also use the /MP
option for file-level paralellism, instead of just project-level paralellism.
Other parallel utilities:
The more the compiler tries to optimize, the harder it has to work.
Moving your less frequently modified code into libraries can reduce compile time. By using shared libraries (.so
or .dll
), you can reduce linking time as well.
More RAM, faster hard drives (including SSDs), and more CPUs/cores will all make a difference in compilation speed.
I work on the STAPL project which is a heavily-templated C++ library. Once in a while, we have to revisit all the techniques to reduce compilation time. In here, I have summarized the techniques we use. Some of these techniques are already listed above:
Although there is no proven correlation between the symbol lengths and compilation time, we have observed that smaller average symbol sizes can improve compilation time on all compilers. So your first goals it to find the largest symbols in your code.
You can use the nm
command to list the symbols based on their sizes:
nm --print-size --size-sort --radix=d YOUR_BINARY
In this command the --radix=d
lets you see the sizes in decimal numbers (default is hex). Now by looking at the largest symbol, identify if you can break the corresponding class and try to redesign it by factoring the non-templated parts in a base class, or by splitting the class into multiple classes.
You can run the regular nm
command and pipe it to your favorite script (AWK, Python, etc.) to sort the symbols based on their length. Based on our experience, this method identifies the largest trouble making candidates better than method 1.
"Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantiation process".
You can install Templight by checking out LLVM and Clang (instructions) and applying the Templight patch on it. The default setting for LLVM and Clang is on debug and assertions, and these can impact your compilation time significantly. It does seem like Templight needs both, so you have to use the default settings. The process of installing LLVM and Clang should take about an hour or so.
After applying the patch you can use templight++
located in the build folder you specified upon installation to compile your code.
Make sure that templight++
is in your PATH. Now to compile add the following switches to your CXXFLAGS
in your Makefile or to your command line options:
CXXFLAGS+=-Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system
Or
templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system
After compilation is done, you will have a .trace.memory.pbf and .trace.pbf generated in the same folder. To visualize these traces, you can use the Templight Tools that can convert these to other formats. Follow these instructions to install templight-convert. We usually use the callgrind output. You can also use the GraphViz output if your project is small:
$ templight-convert --format callgrind YOUR_BINARY --output YOUR_BINARY.trace $ templight-convert --format graphviz YOUR_BINARY --output YOUR_BINARY.dot
The callgrind file generated can be opened using kcachegrind in which you can trace the most time/memory consuming instantiation.
Although there are no exact solution for reducing the number of template instantiations, there are a few guidelines that can help:
For example, if you have a class,
template <typename T, typename U> struct foo { };
and both of T
and U
can have 10 different options, you have increased the possible template instantiations of this class to 100. One way to resolve this is to abstract the common part of the code to a different class. The other method is to use inheritance inversion (reversing the class hierarchy), but make sure that your design goals are not compromised before using this technique.
Using this technique, you can compile the common section once and link it with your other TUs (translation units) later on.
If you know all the possible instantiations of a class you can use this technique to compile all cases in a different translation unit.
For example, in:
enum class PossibleChoices = {Option1, Option2, Option3} template <PossibleChoices pc> struct foo { };
We know that this class can have three possible instantiations:
template class foo<PossibleChoices::Option1>; template class foo<PossibleChoices::Option2>; template class foo<PossibleChoices::Option3>;
Put the above in a translation unit and use the extern keyword in your header file, below the class definition:
extern template class foo<PossibleChoices::Option1>; extern template class foo<PossibleChoices::Option2>; extern template class foo<PossibleChoices::Option3>;
This technique can save you time if you are compiling different tests with a common set of instantiations.
NOTE : MPICH2 ignores the explicit instantiation at this point and always compiles the instantiated classes in all compilation units.
The whole idea behind unity builds is to include all the .cc files that you use in one file and compile that file only once. Using this method, you can avoid reinstantiating common sections of different files and if your project includes a lot of common files, you probably would save on disk accesses as well.
As an example, let's assume you have three files foo1.cc
, foo2.cc
, foo3.cc
and they all include tuple
from STL. You can create a foo-all.cc
that looks like:
#include "foo1.cc" #include "foo2.cc" #include "foo3.cc"
You compile this file only once and potentially reduce the common instantiations among the three files. It is hard to generally predict if the improvement can be significant or not. But one evident fact is that you would lose parallelism in your builds (you can no longer compile the three files at the same time).
Further, if any of these files happen to take a lot of memory, you might actually run out of memory before the compilation is over. On some compilers, such as GCC, this might ICE (Internal Compiler Error) your compiler for lack of memory. So don't use this technique unless you know all the pros and cons.
Precompiled headers (PCHs) can save you a lot of time in compilation by compiling your header files to an intermediate representation recognizable by a compiler. To generate precompiled header files, you only need to compile your header file with your regular compilation command. For example, on GCC:
$ g++ YOUR_HEADER.hpp
This will generate a YOUR_HEADER.hpp.gch file
(.gch
is the extension for PCH files in GCC) in the same folder. This means that if you include YOUR_HEADER.hpp
in some other file, the compiler will use your YOUR_HEADER.hpp.gch
instead of YOUR_HEADER.hpp
in the same folder before.
There are two issues with this technique:
all-my-headers.hpp
). But that means that you have to include the new file in all places. Fortunately, GCC has a solution for this problem. Use -include
and give it the new header file. You can comma separate different files using this technique.For example:
g++ foo.cc -include all-my-headers.hpp
Unnamed namespaces (a.k.a. anonymous namespaces) can reduce the generated binary sizes significantly. Unnamed namespaces use internal linkage, meaning that the symbols generated in those namespaces will not be visible to other TU (translation or compilation units). Compilers usually generate unique names for unnamed namespaces. This means that if you have a file foo.hpp:
namespace { template <typename T> struct foo { }; } // Anonymous namespace using A = foo<int>;
And you happen to include this file in two TUs (two .cc files and compile them separately). The two foo template instances will not be the same. This violates the One Definition Rule (ODR). For the same reason, using unnamed namespaces is discouraged in the header files. Feel free to use them in your .cc
files to avoid symbols showing up in your binary files. In some cases, changing all the internal details for a .cc
file showed a 10% reduction in the generated binary sizes.
In newer compilers you can select your symbols to be either visible or invisible in the Dynamic Shared Objects (DSOs). Ideally, changing the visibility can improve compiler performance, link time optimizations (LTOs), and generated binary sizes. If you look at the STL header files in GCC you can see that it is widely used. To enable visibility choices, you need to change your code per function, per class, per variable and more importantly per compiler.
With the help of visibility you can hide the symbols that you consider them private from the generated shared objects. On GCC you can control the visibility of symbols by passing default or hidden to the -visibility
option of your compiler. This is in some sense similar to the unnamed namespace but in a more elaborate and intrusive way.
If you would like to specify the visibilities per case, you have to add the following attributes to your functions, variables, and classes:
__attribute__((visibility("default"))) void foo1() { } __attribute__((visibility("hidden"))) void foo2() { } __attribute__((visibility("hidden"))) class foo3 { }; void foo4() { }
The default visibility in GCC is default (public), meaning that if you compile the above as a shared library (-shared
) method, foo2
and class foo3
will not be visible in other TUs (foo1
and foo4
will be visible). If you compile with -visibility=hidden
then only foo1
will be visible. Even foo4
would be hidden.
You can read more about visibility on GCC wiki.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With