Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify slow-to-compile function

I have some cpp files that take a lot to compile. They contain some basic classes/code, with some templates, but nothing to justify compile time on the order of dozens of seconds.

I do use a couple of external libs (boost/opencv)

This is what gcc says about the compilation time. How can I find the library/include/function call that's to blame for the horrendous compilation time?

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1445 kB ( 0%) ggc
 phase parsing           :   6.69 (46%) usr   1.61 (60%) sys  12.14 (47%) wall  488430 kB (66%) ggc
 phase lang. deferred    :   1.59 (11%) usr   0.36 (13%) sys   3.83 (15%) wall   92964 kB (13%) ggc
 phase opt and generate  :   6.25 (43%) usr   0.72 (27%) sys  10.09 (39%) wall  152799 kB (21%) ggc
 |name lookup            :   1.05 ( 7%) usr   0.28 (10%) sys   2.01 ( 8%) wall   52063 kB ( 7%) ggc
 |overload resolution    :   0.83 ( 6%) usr   0.18 ( 7%) sys   1.48 ( 6%) wall   42377 kB ( 6%) ggc
...

Profiling the C++ compilation process deals with identifying the slow file, but I'd need more fine-grained information to find the culprit

(Other files/projects compile in milliseconds/seconds, so it's not a matter of computer resources. I use gcc 4.9.1)

like image 394
Sam Avatar asked Mar 07 '15 19:03

Sam


People also ask

Is C++ slow to compile?

Compiling a C++ file takes a very long time when compared to C# and Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script.

Why are templates compiling slowly?

C++ in general is slow to compile because of the ancient include mechanism, which causes the compiler to recursively re-parse every header with all its declarations and definitions and all that's included for every translation unit. Templates just build on that "feature".

What is the fastest C++ compiler?

The LLVM-based Clang and Zapcc compilers produce executables with average performance but feature amongst the fastest compilers in the suite. The Zapcc is the fastest compiler in our compile test.


2 Answers

There are basically two things that cause long compilation times: too many includes and too many templates.

When you are including too many headers and that these headers are including too many headers of their own, it just means that the compiler has a lot of work to do to load all these files and it will spend an inordinate amount of time on the processing passes that it has to do on all code, regardless of whether its actually used or not, like pre-processing, lexical analysis, AST building, etc.. This can be especially problematic when code is spread over a large number of small headers, because the performance is very much I/O bound (lots of time wasted just fetching and reading files from the hard-disk). Unfortunately, Boost libraries tend to be very much structured this way.

Here are a couple of ways or tools to solve this problem:

  • You can use the "include-what-you-use" tool. This is a Clang-based analysis tool that basically looks at what you are actually using in your code, and which headers those things come from, and then reports on any potential optimizations you could make by removing certain unnecessary includes, using forward-declarations instead, or maybe replace the broader "all-in-one" headers with the more fine-grained headers.
  • Most compilers have options to dump the preprocessed sources (on GCC / Clang, it's -E or -E -P options, or simply used GCC's C preprocessor program cpp directly). You can take your source file and comment out different include statements or groups of include statements, and dump the preprocessed source to see the total amount of code that these different headers pull in (and maybe use a line count command, like $ g++ -E -P my_source.cpp | wc -l). This could help you identify, in sheer number of lines of code to process, which headers are the worst offenders. Then, you can see what you can do to avoid them or mitigate the issue somehow.
  • You can also use pre-compiled headers. This is a feature supported by most compilers with which you can specify certain headers (especially oft-included "all-in-one" headers) to be pre-compiled to avoid re-parsing them for every source file that includes them.
  • If your OS supports it, you can use a ram-disk for your code and the headers of your external libraries. This essentially takes up part of your RAM memory and makes it look like a normal hard-disk / file-system. This can significantly reduce compilation times by reducing the I/O latency, since all the headers and source files are read from RAM memory instead of the actual hard-disk.

The second problem is that of template instantiations. In your time report from GCC, there should be a time value reported somewhere for template instantiation phase. If that number is high, which it will be as soon as there is any significant amount of templates meta-programming involved in the code, then you will need to work on that problem. There are lots of reasons why some template-heavy code can be painfully slow to compile, including deeply recursive instantiation patterns, overly fancy Sfinae tricks, abuse of type-traits and concepts checking, and good old fashion over-engineered generic code. But there are also simple tricks that can fix a lot of issues, like using unnamed namespaces (to avoid all the time wasted generating symbols for instantiations that don't really need to be visible outside the translation unit) and specializing type-traits or concept checks templates (to basically "short-circuit" much of the fancy meta-programming that goes into them). Another potential solution for template instantiations is to use "extern templates" (from C++11) to control where specific template instantiations should be instantiated (e.g., in a separate cpp file) and avoid re-instantiating it everywhere it's used.

Here are a couple of ways or tools to help you identify the bottlenecks:

  • You can use the "Templight" profiling tool (and its auxiliary "Templight-tools" for dealing with the traces). This is again a Clang-based tool that can be used as a drop-in replacement for the Clang compiler (the tool is actually an instrumented full-blown compiler) and it will generate a complete profile of all the template instantiations that occur during compilation, including the time spent on each (and optionally, memory consumption estimates, although this will affect the timing values). The traces can later be converted to a Callgrind format and be visualized in KCacheGrind, just read the description on that on the templight-tools page. This can basically be used like a typical run-time profiler, but for profiling the time and memory consumption when compiling template-heavy code.
  • A more rudimentary way of going about finding the worst offenders is to create test source files that instantiate particular templates that you suspect are responsible for the long compilation times. Then, you compile those files, time it, and try to work your way (maybe in a "binary search" fashion) towards the worst offenders.

But even with these tricks, identifying template instantiation bottlenecks is easier than actually solving them. So, good luck with that.

like image 110
Mikael Persson Avatar answered Oct 10 '22 11:10

Mikael Persson


This can't be fully answered without information about how your source files are organised and built, so just some general observations.

  1. Template instantiations can increase compile times a lot, particularly if complicated templates are instantiated for several different types/parameters in each of multiple source files. Schemes for explicit template instantiation (i.e. making sure the templates are only instantiated in a few source files rather than all of them) can reduce compilation times in such circumstances (as well as link time, and executable file size). You need to read compiler documentation for how to do this - it does not necessarily occur by default and can mean restructuring your code to support it.
  2. Header files that are #included in many source files, whether needed or not, tend to increase compilation times. I saw one case where a team member wrote a "globals.h" that #included everything, and #included that everywhere - and the build times (in a large project) were increased by an order of magnitude. It's a double whammy - the compilation time of each source file is increased, and that is multiplied by the number of source files that directly or indirectly #include that header. If turning on features like "precompiled headers" causes a speed-up of build times for the second and subsequent builds, this is probably a contributor. (You might view precompiled headers as a solution to this, but bear in mind there are other trade-offs with using them).
  3. If you are using external libs, check to make sure they are installed and configured locally. A compilation process that silently goes looking on the internet for some component (e.g. a hard-coded header file name that is on some remote server) will slow things considerably. You'd be surprised how often that happens with third-party libraries.

Beyond that, techniques to find the problem depend on how your build process is structured.

If you're using a makefile (or some other means) that compiles source files separately, then use some way to time the individual compilation and linking commands. Bear in mind that it may be the link time that dominates.

If you're using a single compilation command (e.g. gcc invoked on multiple files in one command) then break it up into individual commands for each source file.

Once you've isolated which source file (if any) is the offender, then selectively eliminate some sections from it to find which code within it is the problem. As Yakk said in comment, use a "binary search" for this to eliminate functions within the file. I'd suggest removing whole functions first (to narrow down to the offending function) and then use the same technique within an offending function.

It does help to structure your code so the number of functions per file is reasonably small. That reduces need to rebuild large files for a minor change of one function, and helps isolate such problems more easily in future.

like image 20
Rob Avatar answered Oct 10 '22 13:10

Rob