We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU.
Additional details:
--runtime
parameter to gitlab CI, so you can't use nvidia's suggested docker invocation. [ edit: actually, you can now. See https://gitlab.com/gitlab-org/gitlab-runner/merge_requests/764 ]There is now the --gpu
flag on gitlab runner versions >13.9. You should use those insetad. If stuck with older versions, read on.
There are multiple steps:
Note that if you only want to compile CUDA code and don't need to run it, you don't need to use nvidia-docker2, have the nvidia driver on the host PC, and there are no special steps for getting it working in gitlab CI. (ie you only have to do step 3)
I'm afraid I'm not too familiar with docker, so if I've mixed container and image I apologize. If someone with more knowledge wants to fix any typos about docker, it would be greatly appreciated.
YOu have two options here. Either you can use your host's OS's recommended procedure. This is easy, but will mean that the environment may differ across build servers. The other option is to download the installer directly from nVidia (ie https://www.nvidia.com/object/unix.html ) so that you can distribute that with your docker container.
My current test PC is archlinux, so this was a case of using it from the AUR. nVidia provides repositories for several OS's, so see the quickstart guide on the nvidia-docker github page.
You should test your nvidia-docker installation as per the quickstart guide. Running from your host PC the command:
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
should run and output something like:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.18 Driver Version: 415.18 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:02:00.0 On | N/A |
| 28% 39C P0 24W / 120W | 350MiB / 6071MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Notice that although I've specified the 9.0-base image, nvidia-smi reports Cuda 10. I think this is because Cuda 10 is installed on the host PC. The nvidia-docker documentation says that it will use cuda from the docker image, so this shouldn't be a problem.
You should use the Nvidia dockerhub docker images directly unless you have a good reason not to. In my case, I wanted to use a docker image based on Debian, but Nvidia only provides images for Ubuntu and CentOS. Fortunately, Nvidia posts the dockerfile for their images, so you can copy the relevant part of their dockerfiles from them. I based mine on https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile
The magic part of the dockerfile included:
# Install cuda manually
RUN wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux
COPY install_cuda.exp install_cuda.exp
RUN mv cuda_* cuda_install_bin && \
chmod +x cuda_install_bin && \
expect install_cuda.exp && \
rm cuda_*
# Magic copied from nvidia's cuda9.2 dockerfile at
# https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile
ENV CUDA_VERSION 9.2.148
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.2"
The "expect" command is will allow you to write a script to automatically accept the license agreement etc. automatically. It's probably not a good idea for me to post the install_cuda.exp
file (because I can't accept the agreement for you), but in my case I accepted the eula, agreed to install it on an unsupported OS, did not install the graphics driver, did install cuda, used the default path, installed a symlink to usr/local/cuda and did not install the samples.
For more information on expect, see the man page [online man page here].
The inspect file is mostly made up of lines like expect -- "(y)es/(n)o/(q)uit:" { send "y\r" }
You should check you can run nvidia-smi test command for nvidia-smi using your own container. (ie docker run --runtime=nvidia -it your_image_here /bin/sh
)
When researching around the web, most sources tell you that you can't supply the --runtime
flag from gitlab runner configuration. Actually, according to this merge request, you can. To do so, you have to edit /etc/gitlab-runner/config.toml
and add in runtime = "nvidia"
to the right place.
For example, my runner configuration looks like:
[[runners]]
name = "docker-runner-test"
url = "<<REDACTED>>"
token = "<<REDACTED>>"
executor = "docker"
[runners.docker]
tls_verify = false
image = "build_machine"
privileged = false
disable_cache = false
runtime = "nvidia"
volumes = ["/cache"]
pull_policy = "never"
shm_size = 0
[runners.cache]
The --gpus all
argument is supported by GitLab Runners since version 13.9:
https://docs.gitlab.com/runner/configuration/gpus.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With