Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does extern template prevent inlining of functions?

I'm not entirely clear on how the new extern template feature is meant to work in C++11. I understand that it is intended to help speed up compilation time, and simplify linking issues with shared libraries. Does that mean that the compiler does not even parse the function body, forcing a non-inlined call to be made? Or does it simply instruct the compiler to not generate an actual method body when a non-inlined call is made? Obviously, link-time code generation not withstanding.

As a concrete example of where the difference might matter, consider a function that operates on an incomplete type.

//Common header
template<typename T>
void DeleteMe(T* t) {
    delete t;
}

struct Incomplete;
extern template void DeleteMe(Incomplete*);

//Implementation file 1
#include common_header
struct Incomplete { };
template void DeleteMe(Incomplete*);

//Implementation file 2
#include common_header
int main() {
   Incomplete* p = factory_function_not_shown();
   DeleteMe(p);
}

Within "Implementation file 2", it is unsafe to delete a pointer to Incomplete. So an inlined version of DeleteMe would fail. But if it is left as an actual function call, and the function itself were generated within "Implementation file 1", everything will work correctly.

As a corollary, are the rules the same for member functions of templated classes with a similar extern template class declaration?

For experimental purposes, MSVC produces the correct output to the above code, but if the extern line is removed generates a warning about deleting an incomplete type. However, this is the remnants of a non-standard extension they introduced years ago so I'm not sure how much I can trust this behavior. I don't have access to any other build environments to experiment on [save ideone et al, but being limited to one translation unit is rather limiting in this case].

like image 631
Dennis Zickefoose Avatar asked Jul 17 '11 19:07

Dennis Zickefoose


People also ask

What is extern template?

Extern templatesA template specialization can be explicitly declared as a way to suppress multiple instantiations. For example: #include "MyVector. h" extern template class MyVector<int>; // Suppresses implicit instantiation below --

Are templates always Inlined?

An explicit specialization of a template is a function, not a template. That function does not become inline just because the template that was specialized is marked with inline . So inline on the template is completely irrelevant.

Do template functions need to be inline?

Yes, you need the inline specifier there. The ODR (one-definition rule) states that there must be exactly one definition of a variable, function, class, enum or template.


3 Answers

The idea behind extern templates is to make explicit template instantiations more useful.

As you know, in C++03, you can explicitly instantiate a template using this syntax:

template class SomeTemplateClass<int>;
template void foo<bool>();

This tells the compiler to instantiate the template in the current translation unit. However, this doesn't stop implicit instantiations from happening: the compiler still has to perform all implicit instantiations and then merge them together again during linking.

Example:

// a.h
template <typename> void foo() { /* ... */ }

// a.cpp
#include "a.h"
template void foo<int>();

// b.cpp
#include "a.h"
int main()
{
    foo<int>();
    return 0;
} 

Here, a.cpp explicitly instantiates foo<int>(), but once we go to compile b.cpp, it will instantiate it again because b.cpp has no idea that a.cpp is going to instantiate it anyway. For large functions with many different translation units doing implicit instantiations, this can add quite significantly to compile and link time. It may also cause the function to be unnecessarily inlined, which can lead to significant code bloat.

With extern templates, you can let other source files know that you plan to instantiate the template explicitly:

// a.h
template <typename> void foo() { /* ... */ }
extern template void foo<int>();

This way, b.cpp won't cause an instantiation of foo<int>(). The function will be instantiated in a.cpp and will be linked like any normal function. It's also much less likely to be inlined.

Note that this doesn't prevent inlining -- the function could still be inlined at link time in exactly the same way that a normal non-inline function can still be inlined.

EDIT: For those that are curious, I just did a quick test to see how much time g++ spends instantiating templates. I tried instantiating std::sort<int*> in a varying number of translation units, with and without the instantiation being suppressed. The result was conclusive: 30ms per instantiation of std::sort. There's definitely time to be saved here in a large project.

like image 134
Peter Alexander Avatar answered Oct 21 '22 09:10

Peter Alexander


Here is an interesting example :

#include <algorithm>
#include <string>

extern template class std::basic_string<char>;
int foo(std::string s)
{
    int res = s.length();
    res += s.find("some substring");
    return res;
}

When compiled with g++-7.2 at -O3, this produces a non-inlined call to string::find BUT an inlined call to string::size.

While without the extern template, everything is indeed inlined. Clang has the same behaviour and MSVC is almost unable to inline anything in any case.

So the anwser is : it depends, and compilers may have special heuristics for this.

like image 2
Jean-Michaël Celerier Avatar answered Oct 21 '22 10:10

Jean-Michaël Celerier


Using extern template class does not seem to prevent inlining. I will illustrate this via an example, it is a bit involved but the simplest I can come up with.

In file a.h we define template class CFoo,

#ifndef A_H
#define A_H
#include <iostream>

template <typename T> class CFoo{
  public: CFoo(){
      std::cout << "CFoo Constructor, edit 0" << std::endl;
    }
};

extern template class CFoo<int>;
#endif

At the end of a.h we use extern template class CFoo<int> to indicate to any translation unit with #include a.h that it does not need to generate any code for CFoo. It's a promise we make that all things CFoo will link smoothly.

In file c.cpp we have,

#include "a.h"

void run(){
  CFoo<int> cf;
}

Due to the extern template class promise' at the end of a.h, the translation unit of c.cpp does notneed to' generate any code for class CFoo.

Finally we declare a main function in b.cpp,

void run();
int main(){
  run();
  return 0;
}

There is nothing fancy in b.cpp, we simply declare void run() which will be linked to the implementation of the translation unit b.cpp at link-time. For completeness, here is a makefile

cflags = -std=c++11 -O1

b : b.o a.o c.o
  g++ ${cflags} b.o a.o c.o -o b

b.o : b.cpp 
  g++ ${cflags} -c b.cpp -o b.o

c.o : c.cpp 
  g++ ${cflags} -c c.cpp -o c.o

a.o : a.cpp a.h
  g++ ${cflags} -c a.cpp -o a.o

clean:
  rm -rf a.o b.o c.o b

Using this makefile compiles and links an executable a which outputs ``CFoo Constructor, edit 0'' when run. But note! In the example above we do not seem to have declared CFoo<int> anywhere : CFoo<int> is definitely not declared on translation unit b.cpp as the header does not appear on that translation unit, and translation unit c.cpp was told that it didn't need to implement CFoo. So what's going on?

Make one change to the makefile : replace -O1 with -O0, and make clean make

Now, the link call results in an error (using gcc 4.8.4)

c.o: In function `run()':
c.cpp:(.text+0x10): undefined reference to `CFoo<int>::CFoo()'

This is the error which we would expect if there were no inlining in the first place. At least this is the conclusion I come to, further ideas are very welcome.

To get linking with -O1, we need to keep our promise and provide an implementation of CFoo, this we provide in file a.cpp

#include "a.h"
template void foo<int>();

We can now be guaranteed that CFoo appears on the translation unit of a.cpp, and our promise will be kept. As an aside, note that template void foo<int>() in a.cpp in preceded by extern template void foo<int>() via the inclusion of a.h, which is not problematic.

Finally, I find this unpredictable optimisation dependent behaviour annoying, as it means that modifications to a.h and recompilation of a.cpp might not be reflected in run() as expected if there were no inlining (try changing standard output of Foo constructor and remake).

like image 4
newling Avatar answered Oct 21 '22 10:10

newling