I am doing a project on GPU, and I have to use atomicAdd() for double, because the cuda does not support it for double, so I use the code below, which is NVIDIA provide.
__device__ double atomicAdd(double* address, double val)
{
unsigned long long int* address_as_ull =
(unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed,
__double_as_longlong(val +
__longlong_as_double(assumed)));
} while (assumed != old);
return __longlong_as_double(old);
}
Now I want to know why the implement require a loop, while (assumed!=old)
Basically because the implementation requires a load, which can't be performed atomically. The compare-and-swap operation is an atomic version of
(*address == assumed) ? (assumed + val) : *address
There is no guarantee the the value at *address
won't change between the cycle that the value is loaded from *address
and the cycle that the atomicCAS
call is used to store the updated value. If that happens, the value at *address
won't be updated. Therefore the loop ensures that the two operations are repeated until there is no change of the value at *address
between the read and the compare-and-swap operation, which implies that the update took place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With