I have a question regarding an implementation of a Dockerfile on dask-docker.
FROM continuumio/miniconda3:4.8.2
RUN conda install --yes \
-c conda-forge \
python==3.8 \
[...]
&& rm -rf /opt/conda/pkgs
COPY prepare.sh /usr/bin/prepare.sh
RUN mkdir /opt/app
ENTRYPOINT ["tini", "-g", "--", "/usr/bin/prepare.sh"]
prepare.sh is just facilitating installation of additional packages via conda, pip and apt.
There are two things I don't get about that:
exec "$@" where one can start a scheduler or worker - that's more what I associate with tini.This way everytime you run the container from the built image you have to repeat the installation process!?
Maybe I'm overthinking it but it seems rather unusual - but maybe that's a Dockerfile pattern with good reasons for it.
optional bonus questions for Dask insiders:
/usr/bin (instead of f.x. to /tmp)?/opt/app?It really depends on the nature and usage of the files being installed by the entry point script. In general, I like to break this down into a few categories:
Local files that are subject to frequent changes on the host system, and will be rolled into the final image for production release. This is for things like the source code for an application that is under development and needs to be tested in the container. You want these to be copied into the runtime every time the image is rebuilt. Use a COPY in the Dockerfile.
Files from other places that change frequently and/or are specific to the deployment environment. This is stuff like secrets from a Hashicorp vault, network settings, server configurations, etc.... that will probably be downloaded into the container all the time, even when it goes into production. The entry point script should download these, and it should decide which files to get and from where based on environment variables that are injected by the host.
libraries, executable programs (under /bin, /usr/local/bin, etc...), and things that specifically should not change except during a planned upgrade. Usually anything that is installed using pip, maven or some other program that does dependency management, and anything installed with apt-get or equivalent. These files should not be installed from the Dockerfile or from the entrypoint script. Much, much better is to build your base image with all of the dependencies already installed, and then use that image as the FROM source for further development. This has a number of advantages: it ensures a stable, centrally located starting platform that everyone can use for development and testing (it forces uniformity where it counts); it prevents you from hammering on the servers that host those libraries (constantly re-downloading all of those libraries from pypy.org is really bad form... someone has to pay for that bandwidth); it makes the build faster; and if you have a separate security team, this might help reduce the number of files they need to scan.
You are probably looking at #3, but I'm including all three since I think it's a helpful way to categorize things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With