Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cuda atomics change flag

Tags:

atomic

cuda

I have a piece of serial code which does something like this

if( ! variable )
{
  do some initialization here 
  variable = true;
}

I understand that this works perfectly fine in serial and will only be executed once. What atomics operation would be the correct one here in CUDA?

like image 383
ThatQuantDude Avatar asked Dec 25 '22 20:12

ThatQuantDude


1 Answers

It looks to me like what you want is a "critical section" in your code. A critical section allows one thread to execute a sequence of instructions while preventing any other thread or threadblock from executing those instructions.

A critical section can be used to control access to a memory area, for example, so as to allow un-conflicted access to that area by a single thread.

Atomics by themselves can only be used for a very limited, basically single operation, on a single variable. But atomics can be used to build a critical section.

You should use the following code in your kernel to control thread access to a critical section:

__syncthreads();
if (threadIdx.x == 0)
  acquire_semaphore(&sem);
__syncthreads();
  //begin critical section
  // ... your critical section code goes here
  //end critical section
__threadfence(); // not strictly necessary for the lock, but to make any global updates in the critical section visible to other threads in the grid
__syncthreads();
if (threadIdx.x == 0)
  release_semaphore(&sem);
__syncthreads();

Prior to the kernel define these helper functions and device variable:

__device__ volatile int sem = 0;

__device__ void acquire_semaphore(volatile int *lock){
  while (atomicCAS((int *)lock, 0, 1) != 0);
  }

__device__ void release_semaphore(volatile int *lock){
  *lock = 0;
  __threadfence();
  }

I have tested and used successfully the above code. Note that it essentially arbitrates between threadblocks using thread 0 in each threadblock as a requestor. You should further condition (e.g. if (threadIdx.x < ...)) your critical section code if you want only one thread in the winning threadblock to execute the critical section code.

Having multiple threads within a warp arbitrate for a semaphore presents additional complexities, so I don't recommend that approach. Instead, have each threadblock arbitrate as I have shown here, and then control your behavior within the winning threadblock using ordinary threadblock communication/synchronization methods (e.g. __syncthreads(), shared memory, etc.)

Note that this methodology will be costly to performance. You should only use critical sections when you cannot figure out how to otherwise parallelize your algorithm.

Finally, a word of warning. As in any threaded parallel architecture, improper use of critical sections can lead to deadlock. In particular, making assumptions about order of execution of threadblocks and/or warps within a threadblock is a flawed approach.

Here is an example of usage of binary_semaphore to implement a single device global "lock" that could be used for access control to a critical section.

like image 114
Robert Crovella Avatar answered Jan 05 '23 17:01

Robert Crovella