I am trying to build a Docker container on a server within which a conda environment is built. All the other requirements are satisfied except for CUDA enabled PyTorch (I can get PyTorch working without CUDA however, no problem). How do I make sure PyTorch is using CUDA?
This is the Dockerfile
:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml \
&& conda install -y -c conda-forge -n camera-seg flake8
# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]
# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
This gives me the following error when I try to build this container ( docker build -t camera-seg .
):
.....
Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed. (See above for error)
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1
This is the requirements.yaml
:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
When I put pytorch
, torchvision
and cudatoolkit=10.2
within the requirements.yaml
, then PyTorch is successfully installed but it cannot recognize CUDA ( torch.cuda.is_available()
returns False
).
I have tried various solutions, for example, this, this and this and some different combinations of them but all to no avail.
Any help is much appreciated. Thanks.
Yes it's needed, since the binaries ship with their own libraries and will not use your locally installed CUDA toolkit unless you build PyTorch from source or a custom CUDA extension. PyTorch install: CUDA 11.3 or 11.6?
I got it working after many, many tries. Posting the answer here in case it helps anyone.
Basically, I installed pytorch
and torchvision
through pip
(from within the conda
environment) and rest of the dependencies through conda
as usual.
This is how the final Dockerfile
looks:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml
RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg
And this is how the requirements.yaml
looks like:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- pip
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
- pip:
- torch
- torchvision
Then I build the container using the command docker build -t camera-seg .
and PyTorch is now being able to recognize CUDA.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With