Concurrency, 4 CUDA Applications competing to get GPU resources

Tags:

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous:

Kernel launches
Device device memory copies
Host device memory copies of a memory block of 64 KB or less
Memory copies performed by functions that are suffixed with Async
Memory set function calls

As well it mentions that devices with compute capability 2.0 are able to execute multiple kernels concurrently as long as the kernels belong to the same context.

Does this type of concurrency just apply to streams within a single cuda applications but not possible when there are complete different applications requesting GPU resources??

Does that mean that the concurrent support is just available within 1 application (context???) and that the 4 applications will just run concurrent in the way that the methods might be overlaped by context switching in the CPU but the 4 applications need to wait until the GPU is freed by the other applications? (i.e Kernel launch from app4 waits until a kernel launch from app1 finishes..)

If that is the case, how these 4 applications might access GPU resources without suffering long waiting times?

974

asked Sep 14 '10 13:09

Bartzilla

1 Answers

As you said only one "context" can occupy each of the engines at any given time. This means that one of the copy engines can be serving a memcpy for application A, the other a memcpy for application B, and the compute engine can be executing a kernel for application C (for example).

An application can actually have multiple contexts, but no two applications can share the same context (although threads within an application can share a context).

Any application that schedules work to run on the GPU (i.e. a memcpy or a kernel launch) can schedule the work asynchronously so that the application is free to go ahead and do some other work on the CPU and it can schedule any number of tasks to run on the GPU.

Note that it is also possible to put the GPUs in exclusive mode whereby only one context can operate on the GPU at any time (i.e. all the resources are reserved for the context until the context is destroyed). The default is shared mode.

181

answered Nov 06 '22 23:11

Tom

Related questions
                            
                                best c audio library linux [closed]
                            
                                Exception libraries for C (not C++)
                            
                                OpenCV on Embedded Platform
                            
                                Convert float array image to a format usable for opencv
                            
                                C memory management for Cross-platform VM
                            
                                Self Organizing Map (SOM) Implementation
                            
                                What's a good multi-core 64-bit "Hello World" program?
                            
                                Does Visual Studio support data cache operations?
                            
                                Can you cast a pointer to a function of one type to a function of another type that takes additional arguments?
                            
                                How to wrap a C library so that it can be called from a web service
                            
                                What cscope reference card do you use? [closed]
                            
                                Creating a singly linked list in C
                            
                                OpenGL: How do I avoid rounding errors when specifying UV co-ordinates
                            
                                Why malloc always return NULL
                            
                                How to catch unintentional function interpositioning?
                            
                                Wireshark Dissector: How to Identify Missing UDP Frames?
                            
                                How to work with Strings in ARM?
                            
                                Trying to understand the C preprocessor
                            
                                How to write a JIT library?
                            
                                How to convert a float to a non standard encoding

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concurrency, 4 CUDA Applications competing to get GPU resources

Tags:

c

parallel-processing

cuda

gpgpu

nvidia

Bartzilla

People also ask

1 Answers

Tom

Recent Activity

Donate For Us