Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple CUDA contexts for one device - any sense?

I thought I had the grasp of this but apparently I do not:) I need to perform parallel H.264 stream encoding with NVENC from frames that are not in any of the formats accepted by the encoder so I have a following code pipeline:

  • A callback informing that a new frame has arrived is called
  • I copy the frame to CUDA memory and perform the needed color space conversions (only the first cuMemcpy is synchronous, so I can return from the callback, all pending operations are pushed in a dedicated stream)
  • I push an event onto the stream and have another thread waiting for it, as soon as it is set I take the CUDA memory pointer with the frame in the correct color space and feed it to the decoder

For some reason I had the assumption that I need a dedicated context for each thread if I perform this pipeline in parallel threads. The code was slow and after some reading I understood that the context switching is actually expensive, and then I actually came to the conclusion that it makes no sense since in a context owns the whole GPU so I lock out any parallel processing from other transcoder threads.

Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?

Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?

like image 390
Rudolfs Bundulis Avatar asked Apr 30 '15 09:04

Rudolfs Bundulis


2 Answers

Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?

You should be fine with a single context.

Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?

The CUDA device context is discussed in the programming guide. It represents all of the state (memory map, allocations, kernel definitions, and other state-related information) associated with a particular process (i.e. associated with that particular process' use of a GPU). Separate processes will normally have separate contexts (as will separate devices), as these processes have independent GPU usage and independent memory maps.

If you have multi-process usage of a GPU, you will normally create multiple contexts on that GPU. As you've discovered, it's possible to create multiple contexts from a single process, but not usually necessary.

And yes, when you have multiple contexts, kernels launched in those contexts will require context switching to go from one kernel in one context to another kernel in another context. Those kernels cannot run concurrently.

CUDA runtime API usage manages contexts for you. You normally don't explicitly interact with a CUDA context when using the runtime API. However, in driver API usage, the context is explicitly created and managed.

like image 141
Robert Crovella Avatar answered Sep 22 '22 10:09

Robert Crovella


Obviously a few years have passed, but NVENC/NVDEC now appear to have CUstream support as of version 9.1 (circa September 2019) of the video codec SDK: https://developer.nvidia.com/nvidia-video-codec-sdk/download

NEW to 9.1- Encode: CUStream support in NVENC for enhanced parallelism between CUDA pre-processing and NVENC encoding

I'm super new to CUDA, but my basic understanding is that CUcontexts allow multiple processes to use the GPU (by doing context swaps that interrupt each other's work), while CUstreams allow for a coordinated sharing of the GPU's resources from within a single process.

like image 25
aggieNick02 Avatar answered Sep 19 '22 10:09

aggieNick02