Why are dependencies installed via an ENTRYPOINT and tini?

Question

I have a question regarding an implementation of a Dockerfile on dask-docker.

FROM continuumio/miniconda3:4.8.2

RUN conda install --yes \
    -c conda-forge \
    python==3.8 \
    [...]
    && rm -rf /opt/conda/pkgs

COPY prepare.sh /usr/bin/prepare.sh

RUN mkdir /opt/app

ENTRYPOINT ["tini", "-g", "--", "/usr/bin/prepare.sh"]

prepare.sh is just facilitating installation of additional packages via conda, pip and apt.

There are two things I don't get about that:

Why not just place those instructions in the Dockerfile? Possibly indirectly (modularized) by COPYing dedicated files (requirements.txt, environment.yaml, ...)
Why execute this via tini? At the end it does exec "$@" where one can start a scheduler or worker - that's more what I associate with tini.

This way everytime you run the container from the built image you have to repeat the installation process!?

Maybe I'm overthinking it but it seems rather unusual - but maybe that's a Dockerfile pattern with good reasons for it.

optional bonus questions for Dask insiders:

why copy prepare.sh to /usr/bin (instead of f.x. to /tmp)?
What purpose serves the created directory /opt/app?

Z4-tier · Accepted Answer

It really depends on the nature and usage of the files being installed by the entry point script. In general, I like to break this down into a few categories:

Local files that are subject to frequent changes on the host system, and will be rolled into the final image for production release. This is for things like the source code for an application that is under development and needs to be tested in the container. You want these to be copied into the runtime every time the image is rebuilt. Use a COPY in the Dockerfile.
Files from other places that change frequently and/or are specific to the deployment environment. This is stuff like secrets from a Hashicorp vault, network settings, server configurations, etc.... that will probably be downloaded into the container all the time, even when it goes into production. The entry point script should download these, and it should decide which files to get and from where based on environment variables that are injected by the host.
libraries, executable programs (under /bin, /usr/local/bin, etc...), and things that specifically should not change except during a planned upgrade. Usually anything that is installed using pip, maven or some other program that does dependency management, and anything installed with apt-get or equivalent. These files should not be installed from the Dockerfile or from the entrypoint script. Much, much better is to build your base image with all of the dependencies already installed, and then use that image as the FROM source for further development. This has a number of advantages: it ensures a stable, centrally located starting platform that everyone can use for development and testing (it forces uniformity where it counts); it prevents you from hammering on the servers that host those libraries (constantly re-downloading all of those libraries from pypy.org is really bad form... someone has to pay for that bandwidth); it makes the build faster; and if you have a separate security team, this might help reduce the number of files they need to scan.

You are probably looking at #3, but I'm including all three since I think it's a helpful way to categorize things.

Why are dependencies installed via an ENTRYPOINT and tini?

Tags:

docker

dask

tini

Raffael

1 Answers

Z4-tier

Recent Activity

Donate For Us

Why are dependencies installed via an ENTRYPOINT and tini?

Tags:

docker

dask

tini

Raffael

1 Answers

Z4-tier

Related questions

Recent Activity

Donate For Us