Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will instantiating templates in precompiled headers reduce compile times?

Example: Say I include in my precompiled header file:

#include <vector>

As a few instances of the vector, such as std::vector, std::vector etc are used often in my project, will it reduce compile time if I instantiate them as well in the precomiled header like this:

#include <vector>
template class std::vector<float>;
template class std::vector<int>;

Going further, will it make sense to even add dummy functions to the precompiled headers which uses a few functions:

namespace pch_detail {
inline auto func() {
  auto&& v = std::vector<float>{};
  v.size();
  v.begin();
  v.front();
}
}

I'm a very unsure of of how translation units and templates really work, so it seems to me if I instantiate them in the precompiled headers, it should mean that they do not need to be instantiated for every .cpp file.

Update

Tested on a real-world code base with Visual Studio 2017 and some instantiations of commonly used template classes.

  1. With common templated class instantiated: 71731 ms
  2. Without instantiation: 68544 ms

Hence, at least in my case, it took slightly took more time.

like image 434
Viktor Sehr Avatar asked Jul 28 '17 09:07

Viktor Sehr


People also ask

Do templates increase compile time?

Templates are not free to instantiate. Instantiating many templates, or templates with more code than necessary increases compiled code size and build time.

Are templates runtime or compile time?

Templates are the feature that supports generic programming in C++, which are compile time mechanism and there is no runtime overhead associated with using them.

Are templates resolved at compile time?

All the template parameters are fixed+known at compile-time. If there are compiler errors due to template instantiation, they must be caught at compile-time!

What is the purpose of precompiled header?

To reduce compilation times, some compilers allow header files to be compiled into a form that is faster for the compiler to process. This intermediate form is known as a precompiled header, and is commonly held in a file named with the extension .

Are template functions slower?

The short answer is no. Indirectly, however, they can slow things down under a few circumstances. In particular, each instantiation of a template (normally) produces code that's separate and unique from other instantiations.


2 Answers

It can make a difference yes.

Instantiation in translation units can then exploit data in the precompiled header, and a compiler can read that more quickly than the C++ standard library headers.

But you will have to maintain a list of instantiations, so this compile-time optimisation might be more trouble than it's worth - your idea could end up having the opposite effect if you have instantiations that are no longer needed.

like image 154
Bathsheba Avatar answered Sep 21 '22 14:09

Bathsheba


I've also been thinking about this way and I have this question in my mind too. (But I'm noob...)

Another reference: https://msdn.microsoft.com/en-us/library/by56e477.aspx

Maybe explicit extern is needed?

However, when it's time to link, cpp files has been compiled into .obj's, but .pch is not a .obj... Then, where will the instantiation of the template functions be? Will the linker be able to read things from the .pch?

Or we need another separate .cpp dedicated for instantiating them, while declaring all client references as extern?

And.. Link-Time Code Generation?

Had some try

It works a little. Testing with VS2012. Turn On compiler profiling and watch the compiler output.

// stdafx.h
#pragma once

#include "targetver.h"

#include <stdio.h>
#include <tchar.h>
#include <stdlib.h>

#include <vector>
#include <set>
#include <deque>

// stdafx.cpp
#include "stdafx.h"

using namespace std;

template class set<int>;
template set<int>::set();
template set<int>::_Pairib set<int>::insert(const int&);

template class deque<int>;
template deque<int>::deque();
template void deque<int>::push_back(const int&);

template class vector<int>;
template vector<int>::vector();
template void vector<int>::push_back(const int&);

// playcpp.cpp, the entry point

#include "stdafx.h"

using namespace std;
// toggle this block of code
// change a space in the "printf", then build (incrementally)
/*
extern template class set<int>;
extern template set<int>::set();
extern template set<int>::_Pairib set<int>::insert(const int&);

extern template class deque<int>;
extern template deque<int>::deque();
extern template void deque<int>::push_back(const int&);

extern template class vector<int>;
extern template vector<int>::vector();
extern template void vector<int>::push_back(const int&);
*/

int _tmain(int argc, _TCHAR* argv[])
{
    set<int> s;
    deque<int> q;
    vector<int> v;
    for(int i=0;i<10000;i++){
        int choice=rand()%3;
        int value=rand()%100;
        switch(choice){
        case 0: s.insert(value); break;
        case 1: q.push_back(value); break;
        case 2: v.push_back(value); break;
        }
    }
    for(const auto &i:s)
        printf("%d",i);
    for(const auto &i:q)
        printf("%d ",i);
    for(const auto &i:v)
        printf("%d ",i);
    return 0;
}

results (lots of others omitted)

with extern declarations:

1>               1630 毫秒  Build                                      1 次调用
...
1>      757 毫秒  ClCompile                                  1 次调用
1>      787 毫秒  Link                                       1 次调用

without extern declarations:

1>               1801 毫秒  Build                                      1 次调用
...
1>      774 毫秒  Link                                       1 次调用
1>      955 毫秒  ClCompile                                  1 次调用

(Chinese version. Legends: 毫秒:ms / milliseconds,x 次调用:x Calls / called x times)

The power settings are adjusted to let the CPU run slow in order to get longer time to avoid turbulence.

Above are just one sample for each case. Still, it's quite unstable. Both cases may sometimes run for ~200ms more.

But trying many times, there's always about 200ms difference on averge. I can just tell that the averages are around 1650ms and 1850ms, with all difference on ClCompile's time.

Of course there are more calls to other template member functions used, just I didn't have the time to figure out all those type signatures... (could anyone tell me which (const) iterator it will use?)

Well but then.... Are there better ways of doing it?

like image 25
farter Avatar answered Sep 21 '22 14:09

farter