Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA - what is this loop doing

Tags:

cuda

Hey I've seen on a website this example kernel

 __global__ void loop1( int N, float alpha, float* x, float* y ) {
   int i;
   int i0 = blockIdx.x*blockDim.x + threadIdx.x;

   for(i=i0;i<N;i+=blockDim.x*gridDim.x) {
      y[i] = alpha*x[i] + y[i];
    }
}   

To compute this function in C

   for(i=0;i<N;i++) {
      y[i] = alpha*x[i] + y[i];
   }

Surely the for loop inside the kernel isn't necessary? and you can just do y[i0] = alpha*x[i0] + y[i0] and remove the for loop altogether.

I'm just curious as to why it's there and what it's purpose is. This is assuming a kernel call such as loop1<<<64,256>>>> so presumably gridDim.x = 1

like image 208
user660414 Avatar asked Mar 16 '11 20:03

user660414


People also ask

What is CUDA and how does it work and where can we use it?

CUDA is a parallel computing platform and programming model developed by Nvidia for general computing on its own GPUs (graphics processing units). CUDA enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation.

What is stride in CUDA?

As the name suggests, the stride of the loop is the total number of threads in the grid (i.e. blockDim. x * gridDim. x ). __global__ void add(int n, float *x, float *y) { for (int i = blockIdx.

What is grid in CUDA?

A group of threads is called a CUDA block. CUDA blocks are grouped into a grid. A kernel is executed as a grid of blocks of threads (Figure 2). Each CUDA block is executed by one streaming multiprocessor (SM) and cannot be migrated to other SMs in GPU (except during preemption, debugging, or CUDA dynamic parallelism).


1 Answers

You need the for loop in the kernel if your vector has more entrys than you have started threads. If it's possible it is of course more efficent to start enough threads.

like image 168
moggi Avatar answered Sep 29 '22 13:09

moggi