Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dockerfile: Benefits of repeated apt cache cleans

In the quest for ever smaller Docker images, it's common to remove the apt (for Debian/Ubuntu based images) cache after installing packages. Something like

RUN rm -rf /var/lib/apt/lists/*

I've seen a few Dockerfiles where this is done after each package installation (example), i.e. with the pattern

# Install some package
RUN apt-get update \
    && apt-get install -y <some-package> \
    && rm -rf /var/lib/apt/lists/*

# Do something
...

# Install another package
RUN apt-get update \
    && apt-get install -y <another-package> \
    && rm -rf /var/lib/apt/lists/*

# Do something else
...

Are there any benefits of doing this, rather than only cleaning the apt cache at the very end (and thus only updating it once at the beginning)? To me it seems like having to remove and update the cache multiple times just slows down the image build.

like image 419
jmd_dk Avatar asked May 24 '20 18:05

jmd_dk


People also ask

How does Dockerfile cache work?

Each command that is found in a Dockerfile creates a new layer. Each layer contains the filesystem changes to the image for the state before the execution of the command and the state after the execution of the command. Docker uses a layer cache to optimize and speed up the process of building Docker images.

What is RM RF var lib apt lists /*?

rm -rf /var/lib/apt/lists/* removes the lists which are used to figure out which packages are available to install.

Can I use from twice in Dockerfile?

FROM can appear multiple times within a single Dockerfile in order to create multiple images. Simply make a note of the last image ID output by the commit before each new FROM command.

What is Workdir in Dockerfile?

WORKDIR instruction is used to set the working directory for all the subsequent Dockerfile instructions. Some frequently used instructions in a Dockerfile are RUN, ADD, CMD, ENTRYPOINT, and COPY. If the WORKDIR is not manually created, it gets created automatically during the processing of the instructions.


1 Answers

The main reason people do this is to minimise the amount of data stored in that particular docker layer. When pulling a docker image, you have to pull the entire content of the layer.

For example, imagine the following two layers in the image:

RUN apt-get update
RUN rm -rf /var/lib/apt/lists/*

The first RUN command results in a layer containing the lists, which will ALWAYS be pulled by anyone using your image, even though the next command removes those files (so they're not accessible). Ultimately those extra files are just a waste of space and time.

On the other hand,

RUN apt-get update && rm -rf /var/lib/apt/lists/*

Doing it within a single layer, those lists are deleted before the layer is finished, so they are never pushed or pulled as part of the image.

So, why have multiple layers which use apt-get install? This is likely so that people can make better use of layers in other images, as Docker will share layers between images if they're identical in order to save space on the server and speed up builds and pulls.

like image 109
Ben XO Avatar answered Oct 22 '22 18:10

Ben XO