Is it possible to launch two kernels that do independent tasks, simultaneously. For example if I have this Cuda code
// host and device initialization
.......
.......
// launch kernel1
myMethod1 <<<.... >>> (params);
// launch kernel2
myMethod2 <<<.....>>> (params);
Assuming that these kernels are independent, is there a facility to launch them at the same time allocating few grids/blocks for each. Does CUDA/OpenCL have this provision.
Only devices with CUDA compute capability 2.0 and better (i.e. Fermi) can support multiple simultaneous kernel executions. See section 3.2.6.3 of the CUDA 3.0 programming guide, which states:
Some devices of compute capability 2.0 can execute multiple kernels concurrently. Applications may query this capability by calling
cudaGetDeviceProperties()
and checking theconcurrentKernels
property.The maximum number of kernel launches that a device can execute concurrently is four.
A kernel from one CUDA context cannot execute concurrently with a kernel from another CUDA context.
Kernels that use many textures or a large amount of local memory are less likely to execute concurrently with other kernels.
You will need SM 2.0 or above for concurrent kernels.
To get concurrent execution you need to manually indicate that there is no dependence between the two kernels. This is because the compiler cannot determine that one kernel will not modify data being used in the other, this could be by reading from and writing to the same buffer which seems simple enough, but is actually much harder to detect since there can be pointers inside data structures and so on.
To express the independence you must launch the kernels in different streams. The fourth parameter in the triple-chevron syntax specifies the stream, check out the Programming Guide or the SDK concurrentKernels sample.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With