I'm doing cross-platform testing (tooling, not kernel), so I have a custom image (used for ephemeral Jenkins slaves) for each OS, based on standard base images: centos6, centos7, ubuntu14, sles11, sles12, etc.
Aside for the base being different, my images have a lot in common with each other (all of them get a copy of pre-built and frequently changing maven/gradle/npm repositories for speed).
Here is a simplified example of the way the images are created (the tarball is the same across images):
# Dockerfile one
FROM centos:centos6
ADD some-files.tar.gz
# Dockerfile two
FROM ubuntu:14.04
ADD some-files.tar.gz
This results in large images (multi-GB) that have to be rebuilt regularly. Some layer reuse occurs between rebuilds thanks to the docker build cache, but if I can stop having to rebuild images altogether it would be better.
How can I reliably share the common contents among my images?
The images don't change much outside of these directories. This cannot be a simple mounted volume because in use the directories in this layer are modified, so it cannot be read-only and the source must not be changed (so what I'm looking for is closer to a COW but applied to a specific subset of the image)
In later versions of Docker, it provides the use of multi-stage dockerfiles. Using multi-stage dockerfiles, you can use several base images as well as previous intermediate image layers to build a new image layer.
The second image contains all the layers from the first image, plus new layers created by the COPY and RUN instructions, and a read-write container layer. Docker already has all the layers from the first image, so it does not need to pull them again. The two images share any layers they have in common.
Use --squash flag on build It allows you to merge the new layers into one layer during the build time. To use it just add the flag to the build command: docker build --squash -t <image> . You can use it by activating the experimental features in the Docker settings.
Introduction. Docker is a handy tool for containerization. It's so useful that sometimes, we want to have more than one Dockerfile in the project. Unfortunately, this goes against the straightforward convention of naming all Dockerfiles just “Dockerfile”.
Problem with --cache-from:
The suggestion to use --cache-from
will not work:
$ cat df.cache-from
FROM busybox
ARG UNIQUE_ARG=world
RUN echo Hello ${UNIQUE_ARG}
COPY . /files
$ docker build -t test-from-cache:1 -f df.cache-from --build-arg UNIQUE_ARG=docker .
Sending build context to Docker daemon 26.1MB
Step 1/4 : FROM busybox
---> 54511612f1c4
Step 2/4 : ARG UNIQUE_ARG=world
---> Running in f38f6e76bbca
Removing intermediate container f38f6e76bbca
---> fada1443b67b
Step 3/4 : RUN echo Hello ${UNIQUE_ARG}
---> Running in ee960473d88c
Hello docker
Removing intermediate container ee960473d88c
---> c29d98e09dd8
Step 4/4 : COPY . /files
---> edfa35e97e86
Successfully built edfa35e97e86
Successfully tagged test-from-cache:1
$ docker build -t test-from-cache:2 -f df.cache-from --build-arg UNIQUE_ARG=world --cache-from test-from-cache:1 .
Sending build context to Docker daemon 26.1MB
Step 1/4 : FROM busybox
---> 54511612f1c4
Step 2/4 : ARG UNIQUE_ARG=world
---> Using cache
---> fada1443b67b
Step 3/4 : RUN echo Hello ${UNIQUE_ARG}
---> Running in 22698cd872d3
Hello world
Removing intermediate container 22698cd872d3
---> dc5f801fc272
Step 4/4 : COPY . /files
---> addabd73e43e
Successfully built addabd73e43e
Successfully tagged test-from-cache:2
$ docker inspect test-from-cache:1 -f '{{json .RootFS.Layers}}' | jq .
[
"sha256:6a749002dd6a65988a6696ca4d0c4cbe87145df74e3bf6feae4025ab28f420f2",
"sha256:01bf0fcfc3f73c8a3cfbe9b7efd6c2bf8c6d21b6115d4a71344fa497c3808978"
]
$ docker inspect test-from-cache:2 -f '{
{json .RootFS.Layers}}' | jq .
[
"sha256:6a749002dd6a65988a6696ca4d0c4cbe87145df74e3bf6feae4025ab28f420f2",
"sha256:c70c7fd4529ed9ee1b4a691897c2a2ae34b192963072d3f403ba632c33cba702"
]
The build shows exactly where it stops using the cache, when the command changes. And the inspect shows the change of the second layer id even though the same COPY
command was run in each. And anytime the preceding layer differs, the cache cannot be used from the other image build.
The --cache-from
option is there to allow you to trust the build steps from an image pulled from a registry. By default, docker only trusts layers that were locally built. But the same rules apply even when you provide this option.
Option 1:
If you want to reuse the build cache, you must have the preceding layers identical in both images. You could try using a multi-stage build if the base image for each is small enough. However, doing this would lose all of the settings outside of the filesystem (environment variables, entrypoint specification, etc), so you'd need to recreate that as well:
ARG base_image
FROM ${base_image} as base
# the above from line makes the base image available for later copying
FROM scratch
COPY large-content /content
COPY --from=base / /
# recreate any environment variables, labels, entrypoint, cmd, or other settings here
And then build that with:
docker build --build-arg base_image=base1 -t image1 .
docker build --build-arg base_image=base2 -t image2 .
docker build --build-arg base_image=base3 -t image3 .
This could also be multiple Dockerfiles if you need to change other settings. This will result in the entire contents of each base image being copied, so make sure your base image is significantly smaller to make this worth the effort.
Option 2:
Reorder your build to keep common components at the top. I understand this won't work for you, but it may help others coming across this question later. It's the preferred and simplest solution that most people use.
Option 3:
Remove the large content from your image and add it to your containers externally as a volume. You lose the immutability + copy-on-write features of layers of the docker filesystem. And you'll manually need to ship the volume content to each of your docker hosts (or use a network shared filesystem). I've seen solutions where a "sync container" is run on each of the docker hosts which performs a git pull
or rsync
or any other equivalent command to keep the volume updated. If you can, consider mounting the volume with :ro
at the end to make it read only inside the container where you use it to give you immutability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With