Managing multiple GPUs with multiple users

Question

I have a server (Ubuntu 16.04) with 4 GPUs. My team shares this, and our current approach is to containerize all of our work with Docker, and to restrict containers to GPUs using something like $ NV_GPU=0 nvidia-docker run -ti nvidia/cuda nvidia-smi. This works well when we're all very clear about who's using which GPU, but our team has grown and I'd like a more robust way of monitoring GPU use and prohibit access to GPUs when they're in use. nvidia-smi is one channel of information with the "GPU-Util", but sometimes the GPU may have a 0% GPU-Util at one moment while it is currently reserved by someone working in a container.

Do you have any recommendations for:

Tracking when a user runs $ NV_GPU='gpu_id' nvidia-docker run
Kicking an error when another user runs $ NV_GPU='same_gpu_id' nvidia-docker run
Keeping an updated log that's something along the lines of {'gpu0':'user_name or free', . . ., 'gpu3':'user_name or free'}, where for every gpu it identifies the user who ran the active docker container utilizing that gpu, or it states that it is 'free'. Actually, stating the user and the container that is linked to the gpu would be preferable.
Updating the log when the user closes the container that is utilizing the gpu

I may be thinking about this the wrong way too, so open to other ideas. Thanks!

Eljas Hyyrynen · Accepted Answer

Sounds like a great place to apply CI/CD practises. What you need is a job queue. Each user may request to use the resources (=GPUs) by triggering the pipeline in some way e.g. pushing a commit on a specific branch. Then, an automatic system will allocate the shared resources in an ordered manner and everybody will eventually get their experiments done.

This is probably the most scalable way to do this. Much more than reservation calendars or ad hoc usage. The only way that is more scalable is to buy compute from cloud but that is not in the scope of OPs question.

Managing multiple GPUs with multiple users

Tags:

docker

gpu

nvidia

multi-gpu

SocraticDatum

1 Answers

Eljas Hyyrynen

Recent Activity

Donate For Us

Managing multiple GPUs with multiple users

Tags:

docker

gpu

nvidia

multi-gpu

SocraticDatum

1 Answers

Eljas Hyyrynen

Related questions

Recent Activity

Donate For Us