Consider the following:
// foo.h
class Foo
{
public:
int x = 2;
int y = 3;
void DoSomething_SSE();
void DoSomething_AVX();
// ( Implicit default constructor is generated "inline" here )
};
// Foo_AVX.cpp, compiled with -mavx or /arch:AVX
void Foo::DoSomething_AVX()
{
// AVX optimised implementation here
}
// Foo_SSE.cpp, compiled with -msse2 or /arch:SSE2
void Foo::DoSomething_SSE()
{
// SSE optimised implementation here
}
Here's the problem: the compiler will generate the implied default constructor with 'inline' semantics (note: inline semantics does not mean the function will necessarily be inlined) in each translation unit, and - in cases where the constructor is not inlined - the linker will then choose one implementation and discard the other.
If the linker chooses the constructor generated in the AVX compilation unit, this code will then crash with an illegal instruction on a machine which doesn't support AVX.
It is possible to stop the crash by putting in an explicit default constructor, either __forceinline (to make sure it's inlined once per compilation unit), or declared in the header and defined in a compilation unit which is compiled with the lowest common denominator instruction set.
However, surely there's a way to get the language to handle this better than having to write dummy functions..?
(llvm-clang++ 9.x.x/x64 on Mac OS X)
Compile the AVX Translation units with gcc or clang -mavx -fno-implement-inlines; the linker will have to find the symbol from the SSE translation units if the functions don't simply inline.
From the GCC manual:
-fno-implement-inlines
To save space, do not emit out-of-line copies of inline functions controlled by#pragma implementation. This causes linker errors if these functions are not inlined everywhere they are called.
Clang supports this option, too.
This does not disable inlining of anything, it only disables emitting a stand-alone definition of functions declared as inline or in a class definition.
With optimization enabled, a small default constructor like in the question should inline (and use the target ISA options of the current function/compilation unit), making this irrelevant most of the time. But it will make sure that un-optimized builds work properly on non-AVX machines.
It appears that another option is to not use compiler flags to set the instruction set - leave them on the default, and wrap only the functions which require the enhanced instruction set:
#include Foo.h
// Switch on AVX optimisations for the function where they're needed
#pragma clang attribute push (__attribute__((target("arch=sandybridge"))), apply_to = function)
void Foo::DoSomething_AVX()
{
// AVX optimised implementation here
}
#pragma clang attribute pop
Using #pragma clang attribute push(...), while a bit more long-winded than simple [[]] or __attribute__(()), seems to have the advantage that the attribute is automatically applied to any template code etc. instantiated from within the pragma's scope.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With