When I use float atomicAdd(float *address, float val) to add a float value smaller than approx. 1e-39 to 0, the addition does not work, and the value at address remains 0.
Here is the simplest code:
__device__ float test[6] = {0};
__global__ void testKernel() {
float addit = sinf(1e-20);
atomicAdd(&test[0], addit);
test[1] += addit;
addit = sinf(1e-37);
atomicAdd(&test[2], addit);
test[3] += addit;
addit = sinf(1e-40);
atomicAdd(&test[4], addit);
test[5] += addit;
}
When I run the code above as testKernel<<<1, 1>>>(); and stop with the debugger I see:
test 0x42697800
[0] 9.9999997e-21
[1] 9.9999997e-21
[2] 9.9999999e-38
[3] 9.9999999e-38
[4] 0
[5] 9.9999461e-41
Notice the difference between test[4] and test[5]. Both did the same thing, yet the simple addition worked, and the atomic one did nothing at all. What am I missing here?
Update: System info: CUDA 5.5.20, NVidia Titan card, Driver 331.82, Windows 7x64, Nsight 3.2.1.13309.
atomicAdd is a special instruction that does not necessarily obey the same flush and rounding behaviors that you might get if you specify for example -ftz=true or -ftz=false on other floating point operations (e.g. ordinary fp add)
As documented in the PTX ISA manual:
The floating-point operation .add is a single-precision, 32-bit operation. atom.add.f32 rounds to nearest even and flushes subnormal inputs and results to sign-preserving zero.
So even though ordinary floating point add should not flush denormals to zero if you specify -ftz=false (which is the default, I believe, for nvcc), the floating point atomic add operation to global memory will flush to zero (always).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With