How well does NVCC optimize device code? Does it do any sort of optimizations like constant folding and common subexpression elimination?
E.g, will it reduce the following:
float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);
to this:
float sqrt_2pi = sqrtf(2 * M_PI); // Compile time constant
float a = 1 / sqrt_2pi;
float b = c / sqrt_2pi;
What about more clever optimizations, involving knowing semantics of math functions:
float a = 1 / sqrtf(c * M_PI);
float b = c / sqrtf(M_PI);
to this:
float sqrt_pi = sqrtf(M_PI); // Compile time constant
float a = 1 / (sqrt_pi * sqrtf(c));
float b = c / sqrt_pi;
LTO provides a performance boost for all the compilers. With small projects, this boost probably wouldn't be noticeable, but for big ones this option definitely makes a difference.
It is the purpose of nvcc , the CUDA compiler driver, to hide the intricate details of CUDA compilation from developers. It accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process.
NVCC is a compiler driver which works by invoking all the necessary tools and compilers like cudacc, g++, cl, etc. NVCC can output either C code (CPU Code) that must then be compiled with the rest of the application using another tool or PTX or object code directly.
nvcc is the compiler driver used to compile both . cu and . cpp files. It uses the cl.exe (on Windows) or gcc (on Linux) executable that it can find as the compiler for host code.
The compiler is way ahead of you. In your example:
float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);
nvopencc (Open64) will emit this:
mov.f32 %f2, 0f40206c99; // 2.50663
div.full.f32 %f3, %f1, %f2;
mov.f32 %f4, 0f3ecc422a; // 0.398942
which is equivalent to
float b = c / 2.50663f;
float a = 0.398942f;
The second case gets compiled to this:
float a = 1 / sqrtf(c * 3.14159f); // 0f40490fdb
float b = c / 1.77245f; // 0f3fe2dfc5
I am guessing the expression for a
generated by the compiler should be more accurate than your "optmized" version, but about the same speed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With