Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent GPU kernel execution from multiple processes

I have an application in which I would like to share a single GPU between multiple processes. That is, each of these processes would create its own CUDA or OpenCL context, targeting the same GPU. According to the Fermi white paper[1], application-level context switching is less then 25 microseconds, but the launches are effectively serialized as they launch on the GPU -- so Fermi wouldn't work well for this. According to the Kepler white paper[2], there is something called Hyper-Q that allows for up to 32 simultaneous connections from multiple CUDA streams, MPI processes, or threads within a process.

My questions: Has anyone tried this on a Kepler GPU and verified that its kernels are run concurrently when scheduled from distinct processes? Is this just a CUDA feature, or can it also be used with OpenCL on Nvidia GPUs? Do AMD's GPUs support something similar?

[1] http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

[2] http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

like image 892
Brendan Wood Avatar asked Oct 01 '12 19:10

Brendan Wood


1 Answers

In response to the first question, NVIDIA has published some hyper-Q results in a blog here. The blog is pointing out that the developers who were porting CP2K were able to get to accelerated results more quickly because hyper-Q allowed them to use the application's MPI structure more or less as-is and run multiple ranks on a single GPU, and get higher effective GPU utilization that way. As mentioned in the comments, this (hyper-Q) feature is only available on K20 processors currently, as it is dependent on the GK110 GPU.

like image 92
Robert Crovella Avatar answered Oct 17 '22 12:10

Robert Crovella