Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meaning of following syntax of cuda Kernel

What is meaning of following syntax:

Kernel_fun<<<256, 128, 2056>>>(arg1, arg2, arg3);

Which value indicates workgroup and which value indicates thread.

like image 575
user1046900 Avatar asked Aug 20 '12 12:08

user1046900


People also ask

What is the syntax to write kernel in CUDA?

When a kernel is called, its execution configuration is provided through <<<...>>> syntax, e.g. cuda_hello<<<1,1>>>() . In CUDA terminology, this is called "kernel launch".

What is meant by a kernel in CUDA?

The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.

What is the correct way to launch CUDA kernel?

In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.


1 Answers

From the CUDA Programming Guide, appendix B.22 (as of May 2019):

The execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where:

  • Dg is of type dim3 (see Section B.3.2) and specifies the dimension and size of the grid, such that Dg.x * Dg.y * Dg.z equals the number of blocks being launched; Dg.z must be equal to 1 for devices of compute capability 1.x;

  • Db is of type dim3 (see Section B.3.2) and specifies the dimension and size of each block, such that Db.x * Db.y * Db.z equals the number of threads per block;

  • Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in Section B.2.3; Ns is an optional argument which defaults to 0;

  • S is of type cudaStream_t and specifies the associated stream; S is an optional argument which defaults to 0.

In short: <<< number of blocks, number of threads, dynamic memory per block, associated stream >>>

like image 116
Esoteric Screen Name Avatar answered Sep 28 '22 04:09

Esoteric Screen Name