Suppose I have
template <bool UsesFastMath> void foo(float* data, size_t length);
and I want to compile one instantiation with -ffast-math
(--use-fast-math
for nvcc), and the other instantiation without it.
This can be achieved by instantiating each of the variants in a separate translation unit, and compiling each of them with a different command-line - with and without the switch.
My question is whether it's possible to indicate to popular compilers (*) to apply or not apply -ffast-math
for individual functions - so that I'll be able to have my instantiations in the same translation unit.
Notes:
(*) by popular compilers I mean any of: gcc, clang, msvc icc, nvcc (for GPU kernel code) about which you have that information.
Linking using gcc with -ffast-math makes it disable subnormal numbers for the application by adding this code in a global constructor that runs before main .
Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using -mtune=native produces code optimized for the local machine under the constraints of the selected instruction set.
In GCC you can declare functions like following:
__attribute__((optimize("-ffast-math")))
double
myfunc(double val)
{
return val / 2;
}
This is GCC-only feature.
See working example here -> https://gcc.gnu.org/ml/gcc/2009-10/msg00385.html
It seems that GCC not verifies optimize() arguments. So typos like "-ffast-match" will be silently ignored.
As of CUDA 7.5 (the latest version I am familiar with, although CUDA 8.0 is currently shipping), nvcc
does not support function attributes that allow programmers to apply specific compiler optimizations on a per-function basis.
Since optimization configurations set via command line switches apply to the entire compilation unit, one possible approach is to use as many different compilation units as there are different optimization configurations, as already noted in the question; source code may be shared and #include
-ed from a common file.
With nvcc
, the command line switch --use_fast_math
basically controls three areas of functionality:
You can apply some of these changes with per-operation granularity by using appropriate intrinsics, others by using PTX inline assembly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With