Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple processes launching CUDA kernels in parallel

Tags:

cuda

gpu

I know that NVIDIA gpus with compute capability 2.x or greater can execute u pto 16 kernels concurrently. However, my application spawns 7 "processes" and each of these 7 processes launch CUDA kernels.

My first question is that what would be the expected behavior of these kernels. Will they execute concurrently as well or, since they are launched by different processes, they would execute sequentially.

I am confused because the CUDA C programming guide says:

"A kernel from one CUDA context cannot execute concurrently with a kernel from another CUDA context." This brings me to my second question, what are CUDA "contexts"?

Thanks!

like image 276
user2075543 Avatar asked Feb 15 '13 12:02

user2075543


People also ask

What makes CUDA codes run in parallel?

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on its own GPUs (graphics processing units). CUDA enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation.

Can multiple processes use same GPU?

Sharing a GPU Is Hard Applications that run on the same GPU share its memory. Every byte allocated by one application leaves one less byte for the other applications to use. The only way for multiple applications to run simultaneously is to cooperate with one another.

Can a CUDA kernel call another kernel?

Dynamic Parallelism in CUDA 5.0 enables a CUDA kernel to create and synchronize new nested work, using the CUDA runtime API to launch other kernels, optionally synchronize on kernel completion, perform device memory management, and create and use streams and events, all without CPU involvement.

What are the limitations of CUDA kernel?

kernel cannot allocate, and only isbits types in device arrays: CUDA C has no garbage collection, and Julia has no manual deallocations, let alone on the device to deal with data that live independently of the CuArray. no try-catch-finally in kernel: CUDA C does not support exception handling on device (v11.


1 Answers

A CUDA context is a virtual execution space that holds the code and data owned by a host thread or process. Only one context can ever be active on a GPU with all current hardware.

So to answer your first question, if you have seven separate threads or processes all trying to establish a context and run on the same GPU simultaneously, they will be serialised and any process waiting for access to the GPU will be blocked until the owner of the running context yields. There is, to the best of my knowledge, no time slicing and the scheduling heuristics are not documented and (I would suspect) not uniform from operating system to operating system.

You would be better to launch a single worker thread holding a GPU context and use messaging from the other threads to push work onto the GPU. Alternatively there is a context migration facility available in the CUDA driver API, but that will only work with threads from the same process, and the migration mechanism has latency and host CPU overhead.

like image 172
talonmies Avatar answered Sep 29 '22 05:09

talonmies