Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What causes a cache invalidation when building a Dockerfile?

I've been reading docs Best practices for writing Dockerfiles. I encountered small incorrectness (IMHO) for which meaning was clear after reading further:

Using apt-get update alone in a RUN statement causes caching issues and subsequent apt-get install instructions fail.

Why fail I wondered. Later came explanation of what they meant by "fail":

Because the apt-get update is not run, your build can potentially get an outdated version of the curl and nginx packages.

However, for the following I still cannot understand what they mean by "If not, the cache is invalidated.":

Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.

That part is mentioned in some answers on SO e.g. How does Docker know when to use the cache during a build and when not? and as a whole the concept of cache invalidation is clear to me, I've read below:

When does Docker image cache invalidation occur? Which algorithm Docker uses for invalidate cache?

But what is meaning of "if not"? At first I was sure the phrase meant if no such image is found. That would be overkill - to invalidate cache which maybe useful later for other builds. And indeed it is not invalidated if no image is found when I've tried below:

$ docker build -t alpine:test1 - <<HITTT
> FROM apline
> RUN echo "test1"
> RUN echo "test1-2"
> HITTT
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM apline
pull access denied for apline, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
(base) nb0408:docker a.martianov$ docker build -t alpine:test1 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-2"
> HITTT
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM alpine
 ---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
 ---> Running in 928453d33c7c
test1
Removing intermediate container 928453d33c7c
 ---> 0e93df31058d
Step 3/3 : RUN echo "test1-2"
 ---> Running in b068bbaf8a75
test1-2
Removing intermediate container b068bbaf8a75
 ---> daeaef910f21
Successfully built daeaef910f21
Successfully tagged alpine:test1

$ docker build -t alpine:test1-1 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM alpine
 ---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
 ---> Using cache
 ---> 0e93df31058d
Step 3/3 : RUN echo "test1-3"
 ---> Running in 74aa60a78ae1
test1-3
Removing intermediate container 74aa60a78ae1
 ---> 266bcc6933a8
Successfully built 266bcc6933a8
Successfully tagged alpine:test1-1

$ docker build -t alpine:test1-2 - <<HITTT
> FROM alpine
> RUN "test2"
> RUN 
(base) nb0408:docker a.martianov$ docker build -t alpine:test2 - <<HITTT
> FROM alpine
> RUN echo "test2"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM alpine
 ---> 965ea09ff2eb
Step 2/3 : RUN echo "test2"
 ---> Running in 1a058ddf901c
test2
Removing intermediate container 1a058ddf901c
 ---> cdc31ac27a45
Step 3/3 : RUN echo "test1-3"
 ---> Running in 96ddd5b0f3bf
test1-3
Removing intermediate container 96ddd5b0f3bf
 ---> 7d8b901f3939
Successfully built 7d8b901f3939
Successfully tagged alpine:test2

$ docker build -t alpine:test1-3 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM alpine
 ---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
 ---> Using cache
 ---> 0e93df31058d
Step 3/3 : RUN echo "test1-3"
 ---> Using cache
 ---> 266bcc6933a8
Successfully built 266bcc6933a8
Successfully tagged alpine:test1-3

Cache was again used for last build. What does docs mean by "if not"?

like image 415
Alexei Martianov Avatar asked Dec 11 '19 12:12

Alexei Martianov


1 Answers

Let's focus on your original problem (regarding apt-get update) to make things easier. The following example is not based on any best practices. It just illustrates the point you are trying to understand.

Suppose you have the following Dockerfile:

FROM ubuntu:18.04

RUN apt-get update
RUN apt-get install -y nginx

You build a first image using docker build -t myimage:latest .

What happens is:

  • The ubuntu image is pulled if it does not exist
  • A layer is created and cached to run apt-get update
  • A layer is created an cached to run apt install -y nginx

Now suppose you modify your Docker file to be

FROM ubuntu:18.04

RUN apt-get update
RUN apt-get install -y nginx openssl

and you run a build again with the same command as before. What happens is:

  • There is already an ubuntu image locally so it will not be pulled (unless your force with --pull)
  • A layer was already created with command apt-get update against the existing local image so it uses the cached one
  • The next command has changed so a new layer is created to install nginx and openssl. Since apt database was created in the preceding layer and taken from cache, if a new nginx and/or openssl version was released since then, you will not see them and you will install the outdated ones.

Does this help you to grasp the concept of cached layers ?

In this particular example, the best handling is to do everything in a single layer making sure you cleanup after yourself:

FROM ubuntu:18.04

RUN apt-get update  \
    && apt-get install -y nginx openssl \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
like image 190
Zeitounator Avatar answered Nov 11 '22 11:11

Zeitounator