I have a piece of CUDA code in which threads are performing atomic operations on shared memory. I was thinking since the result of atomic operation will be visible to other threads of the block instantly anyways, it might be good to instruct the compiler to have the shared memory volatile
.
So I changed
__global__ void CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
__shared__ int smem_data[BLOCK_SIZE];
uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
atomicAdd( smem_data+threadIdx.x, 6);
}
}
to
__global__ void volShared_CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
volatile __shared__ int smem_data[BLOCK_SIZE];
uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
atomicAdd( smem_data+threadIdx.x, 6);
}
}
Below compile-time error happens having above change:
error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (volatile int *, int)
Why isn't a volatile
address supported as an argument for atomic operations? Is it because compiler already treats the shared memory as volatile as soon as it identifies there's going to be atomic operations on it?
The definition of the volatile
qualifier is given in the programming guide. It instructs the compiler to always generate a read or write for that access, and never "optimize" it into a register or some other optimization.
Since atomic operations are guaranteed to act on actual memory locations (either shared or global) the combination of the two is unnecessary. Therefore, versions of atomic functions prototyped for volatile
qualifier are not provided.
If you have a memory location that is already declared as volatile
, simply cast it to the corresponding non-volatile
type when you pass the address to your atomic function. The behavior will be as expected.(example)
Therefore, atomic operations can operate on locations specified as volatile
with this proviso.
The simple fact that you have accessed a particular location using atomics somewhere in your code does not mean that the compiler will treat every access elsewhere as implicitly volatile
. If you need volatile
behavior elsewhere, declare it explicitly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With