Running more than one CUDA applications on one GPU

Tags:

CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than one CUDA programs by the same user with only one GPU card installed in the system, what is the effect? Will it guarantee the correctness of execution? How does the GPU schedule tasks in this case?

257

asked Jul 27 '15 00:07

cache

1 Answers

CUDA activity from independent host processes will normally create independent CUDA contexts, one for each process. Thus, the CUDA activity launched from separate host processes will take place in separate CUDA contexts, on the same device.

CUDA activity in separate contexts will be serialized. The GPU will execute the activity from one process, and when that activity is idle, it can and will context-switch to another context to complete the CUDA activity launched from the other process. The detailed inter-context scheduling behavior is not specified. (Running multiple contexts on a single GPU also cannot normally violate basic GPU limits, such as memory availability for device allocations.) Note that the inter-context switching/scheduling behavior is unspecified and may also vary depending on machine setup. Casual observation or micro-benchmarking may suggest that kernels from separate processes on newer devices can run concurrently (outside of MPS) but this is not correct. Newer machine setups may have a time-sliced rather than round-robin behavior, but this does not change the fact that at any given instant in time, code from only one context can run.

The "exception" to this case (serialization of GPU activity from independent host processes) would be the CUDA Multi-Process Server. In a nutshell, the MPS acts as a "funnel" to collect CUDA activity emanating from several host processes, and run that activity as if it emanated from a single host process. The principal benefit is to avoid the serialization of kernels which might otherwise be able to run concurrently. The canonical use-case would be for launching multiple MPI ranks that all intend to use a single GPU resource.

Note that the above description applies to GPUs which are in the "Default" compute mode. GPUs in "Exclusive Process" or "Exclusive Thread" compute modes will reject any attempts to create more than one process/context on a single device. In one of these modes, attempts by other processes to use a device already in use will result in a CUDA API reported failure. The compute mode is modifiable in some cases using the nvidia-smi utility.

117

answered Sep 21 '22 14:09

Robert Crovella

Related questions
                            
                                How does CUDA assign device IDs to GPUs?
                            
                                How to remove cuda completely from ubuntu?
                            
                                Why has atomicAdd not been implemented for doubles?
                            
                                What are the differences between CUDA compute capabilities?
                            
                                Ubuntu 16.04, CUDA 8 - CUDA driver version is insufficient for CUDA runtime version
                            
                                Should I unify two similar kernels with an 'if' statement, risking performance loss?
                            
                                How can I make tensorflow run on a GPU with capability 2.x?
                            
                                Is branch divergence really so bad?
                            
                                Can I program Nvidia's CUDA using only Python or do I have to learn C?
                            
                                Setting up Visual Studio Intellisense for CUDA kernel calls
                            
                                cuda block synchronization
                            
                                Default Pinned Memory Vs Zero-Copy Memory
                            
                                Difference between cuda.h, cuda_runtime.h, cuda_runtime_api.h
                            
                                Thrust inside user written kernels
                            
                                What is the purpose of using multiple "arch" flags in Nvidia's NVCC compiler?
                            
                                CUDA and Classes
                            
                                What's the difference between CUDA shared and global memory?
                            
                                allocating shared memory
                            
                                CUDA: How to use -arch and -code and SM vs COMPUTE
                            
                                Can I use __syncthreads() after having dropped threads?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running more than one CUDA applications on one GPU

Tags:

cuda

gpgpu

gpu

nvidia

cache

People also ask

1 Answers

Robert Crovella

Recent Activity

Donate For Us