Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simultaneous launch of Multiple Kernels using CUDA for a GPU

Is it possible to launch two kernels that do independent tasks, simultaneously. For example if I have this Cuda code

// host and device initialization
.......
.......

// launch kernel1
myMethod1 <<<.... >>> (params);

// launch kernel2
myMethod2 <<<.....>>> (params);

Assuming that these kernels are independent, is there a facility to launch them at the same time allocating few grids/blocks for each. Does CUDA/OpenCL have this provision.

like image 477
cudadev Avatar asked Jun 14 '10 04:06

cudadev


2 Answers

Only devices with CUDA compute capability 2.0 and better (i.e. Fermi) can support multiple simultaneous kernel executions. See section 3.2.6.3 of the CUDA 3.0 programming guide, which states:

Some devices of compute capability 2.0 can execute multiple kernels concurrently. Applications may query this capability by calling cudaGetDeviceProperties() and checking the concurrentKernels property.

The maximum number of kernel launches that a device can execute concurrently is four.

A kernel from one CUDA context cannot execute concurrently with a kernel from another CUDA context.

Kernels that use many textures or a large amount of local memory are less likely to execute concurrently with other kernels.

like image 106
Edric Avatar answered Nov 03 '22 01:11

Edric


You will need SM 2.0 or above for concurrent kernels.

To get concurrent execution you need to manually indicate that there is no dependence between the two kernels. This is because the compiler cannot determine that one kernel will not modify data being used in the other, this could be by reading from and writing to the same buffer which seems simple enough, but is actually much harder to detect since there can be pointers inside data structures and so on.

To express the independence you must launch the kernels in different streams. The fourth parameter in the triple-chevron syntax specifies the stream, check out the Programming Guide or the SDK concurrentKernels sample.

like image 32
Tom Avatar answered Nov 03 '22 01:11

Tom