Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conda cache for docker

Tags:

docker

conda

This is a very similar question to: Docker build: use http cache

I would like to set up a docker container with a custom conda environment. The corresponding dockerfile is:

FROM continuumio/miniconda3

WORKDIR /app
COPY . /app

RUN conda update conda
RUN conda env create -f environment.yml
RUN echo "source activate my_env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH

My environment is rather large, a minimal version could look like this:

name: my_env
channels:
  - defaults
dependencies:
  - python=3.6.8=h0371630_0
prefix: /opt/conda

Every time that I make changes to the dependencies, I have to rebuild the image. And that means re-downloading all the packages. Is it possible to set up a cache somehow? Interfacing the containerized conda with a cache outside the container probably breaks the idea of containering it in the first place. But maybe this is still possible somehow ?

like image 560
lhk Avatar asked Jan 14 '19 14:01

lhk


2 Answers

With Docker Buildkit there is now a feature for just this, called cache mounts. For the precise Syntax see here. To use this feature, change:

RUN conda env create -f environment.yml

to

RUN --mount=type=cache,target=/opt/conda/pkgs conda env create -f environment.yml

and make sure that Buildkit is enable (eg via export DOCKER_BUILDKIT=1). The cache will persist between runs and will be shared between concurrent builds.

like image 179
Simon Boehm Avatar answered Sep 18 '22 10:09

Simon Boehm


This is a very indirect answer to the question, but it works like a charm for me.

Out of the many dependencies, there is a large subset which never changes. I always need python 3.6, numpy, pandas, torch, ...

So, instead of caching conda, you can cache docker and reuse a base image with those dependencies already installed:

FROM continuumio/miniconda3

WORKDIR /app
COPY environment.yml /app

# install package dependencies
RUN conda update conda
RUN conda env create -f environment.yml
RUN echo "source activate api_neural" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH

Then you can add additional config on top of this, in a second dockerfile:

FROM base_deps

# add additional things on top, here I'm running some python in the conda env
RUN /bin/bash -c 'echo $(which python);\
source activate api_neural;\
python -c "import nltk; nltk.download(\"wordnet\"); nltk.download(\"words\")";\
python -m spacy download en;\
python -c "from fastai import untar_data, URLs; model_path = untar_data(URLs.WT103, data=False)"'
like image 42
lhk Avatar answered Sep 19 '22 10:09

lhk