Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Docker know when to use the cache during a build and when not?

Tags:

docker

caching

I'm amazed at how good Docker's caching of layers works but I'm also wondering how it determines whether it may use a cached layer or not.

Let's take these build steps for example:

Step 4 : RUN npm install -g   node-gyp
 ---> Using cache
 ---> 3fc59f47f6aa
Step 5 : WORKDIR /src
 ---> Using cache
 ---> 5c6956ba5856
Step 6 : COPY package.json .
 ---> d82099966d6a
Removing intermediate container eb7ecb8d3ec7
Step 7 : RUN npm install
 ---> Running in b960cf0fdd0a

For example how does it know it can use the cached layer for npm install -g node-gyp but creates a fresh layer for npm install ?

like image 437
Hedge Avatar asked Jul 29 '16 09:07

Hedge


People also ask

How does docker build cache work?

Docker's build-cache is a handy feature. It speeds up Docker builds due to reusing previously created layers. You can use the --no-cache option to disable caching or use a custom Docker build argument to enforce rebuilding from a certain step.

Where does docker store build cache?

About the Docker Build Cache Docker images are built in layers, where each layer is an instruction from a Dockerfile. Layers stack on top of each other, adding functionality incrementally. The build process knew the Dockerfile didn't change, so it used the cache from the last build for all four layers.

How does docker cache run layers?

About Layer Caching in Docker Docker creates container images using layers. Each command that is found in a Dockerfile creates a new layer. Each layer contains the filesystem changes to the image for the state before the execution of the command and the state after the execution of the command.

Does docker pull cache?

Pulling cached images After you configure the Docker daemon to use the Container Registry cache, Docker performs the following steps when you pull a public Docker Hub image with a docker pull command: The Docker daemon checks the Container Registry cache and fetches the images if it exists.


2 Answers

The build cache process is explained fairly thoroughly in the Best practices for writing Dockerfiles: Leverage build cache section.

  • Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.

  • In most cases, simply comparing the instruction in the Dockerfile with one of the child images is sufficient. However, certain instructions require more examination and explanation.

  • For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.

  • Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used.

You will run into situations where OS packages, NPM packages or a Git repo are updated to newer versions (say a ~2.3 semver in package.json) but as your Dockerfile or package.json hasn't updated, docker will continue using the cache.

It's possible to programatically generate a Dockerfile that busts the cache by modifying lines on certain smarter checks (e.g retrieve the latest git branch shasum from a repo to use in the clone instruction). You can also periodically run the build with --no-cache=true to enforce updates.

like image 180
Matt Avatar answered Sep 19 '22 09:09

Matt


It's because your package.json file has been modified, see Removing intermediate container.

That's also usually the reason why package-manager (vendor/3rd-party) info files are COPY'ed first during docker build. After that you run the package-manager installation, and then you add the rest of your application, i.e. src.

If you've no changes to your libs, these steps are served from the build cache.

like image 25
schmunk Avatar answered Sep 20 '22 09:09

schmunk