Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Declaring Variables in a CUDA kernel

Tags:

cuda

Say you declare a new variable in a CUDA kernel and then use it in multiple threads, like:

__global__ void kernel(float* delt, float* deltb) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
float a;
a = delt[i] + deltb[i];
a += 1;
}

and the kernel call looks something like below, with multiple threads and blocks:

int threads = 200;
uint3 blocks = make_uint3(200,1,1);
kernel<<<blocks,threads>>>(d_delt, d_deltb);
  1. Is "a" stored on the stack?
  2. Is a new "a" created for each thread when they are initialized?
  3. Or will each thread independently access "a" at an unknown time, potentially messing up the algorithm?
like image 531
John W. Avatar asked Jul 29 '13 20:07

John W.


2 Answers

Any variable (scalar or array) declared inside a kernel function, without an extern specifier, is local to each thread, that is each thread has its own "copy" of that variable, no data race among threads will occur!

Compiler chooses whether local variables will reside on registers or in local memory (actually global memory), depending on transformations and optimizations performed by the compiler.

Further details on which variables go on local memory can be found in the NVIDIA CUDA user guide, chapter 5.3.2.2

like image 85
Luca Ferraro Avatar answered Nov 07 '22 12:11

Luca Ferraro


None of the above. The CUDA compiler is smart enough and aggressive enough with optimisations that it can detect that a is unused and the complete code can be optimised away.You can confirm this by compiling the kernel with -Xptxas=-v as an option and look at the resource count, which should be basically no registers and no local memory or heap.

In a less trivial example, a would probably be stored in a per thread register, or in per thread local memory, which is off-die DRAM.

like image 26
talonmies Avatar answered Nov 07 '22 12:11

talonmies