Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency, 4 CUDA Applications competing to get GPU resources

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous:

  • Kernel launches
  • Device device memory copies
  • Host device memory copies of a memory block of 64 KB or less
  • Memory copies performed by functions that are suffixed with Async
  • Memory set function calls

As well it mentions that devices with compute capability 2.0 are able to execute multiple kernels concurrently as long as the kernels belong to the same context.

Does this type of concurrency just apply to streams within a single cuda applications but not possible when there are complete different applications requesting GPU resources??

Does that mean that the concurrent support is just available within 1 application (context???) and that the 4 applications will just run concurrent in the way that the methods might be overlaped by context switching in the CPU but the 4 applications need to wait until the GPU is freed by the other applications? (i.e Kernel launch from app4 waits until a kernel launch from app1 finishes..)

If that is the case, how these 4 applications might access GPU resources without suffering long waiting times?

like image 974
Bartzilla Avatar asked Sep 14 '10 13:09

Bartzilla


People also ask

What are the opportunities of concurrency in CUDA?

First, it gives each host thread its own default stream. This means that commands issued to the default stream by different host threads can run concurrently. Second, these default streams are regular streams. This means that commands in the default stream may run concurrently with commands in non-default streams.

What is the correct order of CUDA program processing?

In a typical CUDA program, data are first send from main memory to the GPU memory, then the CPU sends instructions to the GPU, then the GPU schedules and executes the kernel on the available parallel hardware, and finally results are copied back from the GPU memory to the CPU memory. ...

What is function of __ global __ qualifier in CUDA program?

__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU).


1 Answers

As you said only one "context" can occupy each of the engines at any given time. This means that one of the copy engines can be serving a memcpy for application A, the other a memcpy for application B, and the compute engine can be executing a kernel for application C (for example).

An application can actually have multiple contexts, but no two applications can share the same context (although threads within an application can share a context).

Any application that schedules work to run on the GPU (i.e. a memcpy or a kernel launch) can schedule the work asynchronously so that the application is free to go ahead and do some other work on the CPU and it can schedule any number of tasks to run on the GPU.

Note that it is also possible to put the GPUs in exclusive mode whereby only one context can operate on the GPU at any time (i.e. all the resources are reserved for the context until the context is destroyed). The default is shared mode.

like image 181
Tom Avatar answered Nov 06 '22 23:11

Tom