Profiling template metaprogram compilation time

Tags:

I'm working on a C++ project with extensive compile-time computations. Long compilation time is slowing us down. How might I find out the slowest parts of our template meta-programs so I can optimize them? (When we have slow runtime computations, I have many profilers to choose from, e.g. valgrind's callgrind tool. So I tried building a debug GCC and profiling it compiling our code, but I didn't learn much from that.)

I use GCC and Clang, but any suggestions are welcome.

I found profile_templates on Boost's site, but it seems to be thinly documented and require the jam/bjam build system. If you show how to use it on a non-jam project¹, I will upvote you. https://svn.boost.org/svn/boost/sandbox/tools/profile_templates/ appears to count number-of-instantiations, whereas counting time taken would be ideal.

¹_{Our project uses CMake and is small enough that hacking together a Jamfile just for template profiling could be acceptable.}

953

asked Apr 04 '13 17:04

idupree

2 Answers

I know this is an old question, but there is a newer answer that I would like to give.

There is a clang-based set of projects that target this particular problem. The first component is an instrumentation onto the clang compiler which produces a complete trace of all the template instantiations that occurred during compilation, with timing values and optionally memory usage counts as well. That tool is called Templight, as is accessible here (currently needs to compile against a patched clang source tree):

https://github.com/mikael-s-persson/templight

A second component is a conversion tool that allows you to convert the templight traces into other formats, such as easily parsable text-based format (yaml, xml, text, etc.) and into formats that can more easily be visualized, such as graphviz / graphML, and more importantly a callgrind output that can be loaded into KCacheGrind to visualize and inspect the meta-call-graph of template instantiations and their compile-time costs, such as this screenshot of a template instantiation profile of a piece of code that creates a boost::container::vector and sorts it with std::sort:

enter image description here

Check it out here:

https://github.com/mikael-s-persson/templight-tools

Finally, there is also another related project that creates an interactive shell and debugger to be able to interactively walk up and down the template instantiation graph:

https://github.com/sabel83/metashell

169

answered Dec 04 '22 07:12

Mikael Persson

I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.

The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.

Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.

Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.

answered Dec 04 '22 07:12

Dennis

Related questions
                            
                                Why does my IIS7 application pool shutdown after an exception in a DLL called from an ASP.NET page?
                            
                                restful api authentication confusion with oauth2
                            
                                Reference documents with ObjectId when saving in mongoose
                            
                                how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy
                            
                                What is the difference between using IEqualityComparer and Equals/GethashCode Override?
                            
                                How to test or exclude private unreachable code from code coverage
                            
                                How can we listen for errors that do not trigger window.onerror?
                            
                                Page.Title vs Title tag in asp.net
                            
                                Get last day of Month
                            
                                Passing a non-copyable closure object to std::function parameter [duplicate]
                            
                                What is difference between Pre-Signed Url and Signed Url?
                            
                                Eclipse won't compile/run java file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With