So a colleague and I have been debating the benefits of explicit template instantiation when it comes to reducing compile time, separating declaration from definition, and not affecting performance of a C++ math library I have written that is used for other projects.
Essentially I have a library of useful math functions designed to work with primitives like Vector3, Vector4, Quaternion, etc.. All of which are meant to be used with the template argument being float or double (and in some instances int).
So that I do not have to write these functions twice, once for floats once for double, the function implementations are templated, like so:
template<typename T>
Vector3<T> foo(const Vector4<T>& a,
const Quaternion<T>& b)
{ do something... }
All defined in .h files (so they are implicitly marked for inlining). Most of these function are short and are hoped to be inlined during usage compilation.
Headers are getting pretty big though, compile times are going up, and its getting hard to find the existence of functions by just glancing at the headers (that's one of the many reasons I like separating declaration from implementations).
So I can use explicit template instantiation in an accompanying .cpp file, like so:
//in .h
template<typename T>
Vector3<T> foo(const Vector4<T>& a,
const Quaternion<T>& b)
{ do something... }
//in .cpp
template Vector3<float> foo<float>(const Vector4<float>& a,
const Quaternion<float>& b);
template Vector3<double> foo<double>(const Vector4<double>& a,
const Quaternion<double>& b);
This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?
An added benefit is that it does verify that the function compiles, even if i haven't used it yet.
Also I could do this:
//in .h
template<typename T>
Vector3<T> foo(const Vector4<T>& a,
const Quaternion<T>& b);
//in .cpp
template<typename T>
Vector3<T> foo(const Vector4<T>& a,
const Quaternion<T>& b)
{ do something... }
template Vector3<float> foo<float>(const Vector4<float>& a,
const Quaternion<float>& b);
template Vector3<double> foo<double>(const Vector4<double>& a,
const Quaternion<double>& b);
Same questions for that method:
This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?
I expect that the possibility of inlining would definitely be affected, considering the definition is not in the header.
It is nice that it manages to separate the declaration and definition for templated functions (for specific template arguments), without resorting to doing something like using a .inl included at the bottom of the .h file. This also hides the implementation from the user of the library which is beneficial (but not strictly necessary yet), while still being able to use templates so I don't have to implement a function N times.
Is there any way of allowing inlining by adjusting the method?
I have found it difficult just googling for an answer to these questions, and the standards specification is hard to comprehend on these subjects (for me at least).
BTW, this is expected to compile with VS2010, VS2012, and GCC 4.7.
Any assistance would be appreciated.
Thanks
I'm assuming your technique intended to do the same as the answer to this question: Template instantiation effect on compile duration
To achieve the desired result, you would also need to prevent automatic instantiation by declaring the explicit instantiations in the header using extern
. See Explicit instantiation declaration with extern
//in .h
template<typename T>
Vector3<T> foo(const Vector4<T>& a,
const Quaternion<T>& b);
extern template Vector3<float> foo<float>(const Vector4<float>& a,
const Quaternion<float>& b);
extern template Vector3<double> foo<double>(const Vector4<double>& a,
const Quaternion<double>& b);
//in .cpp
template<typename T>
Vector3<T> foo(const Vector4<T>& a,
const Quaternion<T>& b)
{ /* do something...*/ }
template Vector3<float> foo<float>(const Vector4<float>& a,
const Quaternion<float>& b);
template Vector3<double> foo<double>(const Vector4<double>& a,
const Quaternion<double>& b);
This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?
The answer is highly dependent on the compiler - and should be more accurately determined empirically - but we can generalise about it.
We can assume that an increase in compile time comes not from the cost of parsing the additional template angle-bracket syntax, but from the cost of the (complex) process of template instantiation. If this is the case, the cost of using a given template specialization in multiple translation units should significantly increase compile times only if the instantiation is expensive and the compiler performs the instantiation more than once.
The C++ standard implicitly allows the compiler to perform the instantiation of each unique template specialisation only once across all translation units. That is, instantiation of template functions can be deferred and performed after the initial compilation, as described in the Comeau documentation. Whether this optimisation is implemented or not depends on the compiler, but is certainly not implemented in any version of MSVC prior to 2015.
If your compiler performs the instantiation at link time, this technique would prevent inlining if the compiler does not support cross module inlining. Newer versions of MSVC, GCC and Clang all support cross module inlining at link time with an additional linker option (LTCG or LTO). See Can the linker inline functions?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With