Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Impact of yum install on Docker layer size

I have been using Docker for some time while I don't have an IT background.

Now, I am making an effort to understand how the size of my docker images can be minimized by optimizing my Dockerfiles. In this sense, I run into a minimal reproducible case which I don't understand. I would be very glad if anyone could share his ideas or provide an explanation.

I start from an official centos:7 image (7e6257c9f8d8; 203MB). Then, I prepare the following Dockerfile:

FROM centos:7
RUN yum -y install nano && yum -y clean all && rm -fr /var/cache
RUN yum -y install which && yum -y clean all && rm -fr /var/cache
RUN yum -y install which && yum -y clean all && rm -fr /var/cache

The idea is to install whatever lightweight package and evaluate the impact on image size. For this I install nano first, followed by which in a different label. I add an extra attempt of installing which (this identifies there is nothing to do). Moreover, I add yum clean all statements for cleaning yum cache and, just in case (even though I just checked that the experiment result does not vary if I remove this command), I delete the /var/cache dir (this is empty in the base image).

The result is the following:

IMAGE               CREATED             CREATED BY                                      SIZE  
6a14537d3460        7 seconds ago       /bin/sh -c yum -y install which && yum -y cl…   23.9MB
7d924cbdf819        22 seconds ago      /bin/sh -c yum -y install which && yum -y cl…   24.2MB
2b5b04d37a64        42 seconds ago      /bin/sh -c yum -y install nano && yum -y cle…   24.6MB

The installed size of which is 75k and the installed size of nano is 1.6M. I don't identify any additional installed dependencies.

The question is: Why each of these install commands increases the final image by a ~24MB layer even when no packages are actually installed?

Thanks in advance to the community :)

like image 346
J. Berzosa Avatar asked Oct 16 '22 01:10

J. Berzosa


1 Answers

Each RUN instruction creates new docker layer.

Docker itself is not that smart enough to detect that instruction actually did nothing.

It faithfully stores new docker layer in resulting image.

That's why you need to try to minimize amount of docker instructions if possible.

In your case you can use just one RUN instructon:

RUN yum -y install nano which && yum -y clean all && rm -fr /var/cache 

UPDATE

Let's make an experiment:

FROM centos
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which

10 RUN instructions, 9 of them "doing nothing".

Let's build and look for intermediate images

$ docker build .
...
$ docker images -a
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
<none>              <none>              fbd86aedc782        5 seconds ago       263MB
<none>              <none>              ca70a4bbe722        7 seconds ago       261MB
<none>              <none>              bd11e0ab02fb        9 seconds ago       259MB
<none>              <none>              68c20ddfcaad        11 seconds ago      257MB
<none>              <none>              314a6501ad23        13 seconds ago      255MB
<none>              <none>              42a62294a5e7        16 seconds ago      253MB
<none>              <none>              16fad39b9c27        18 seconds ago      251MB
<none>              <none>              6769fe69c9e1        19 seconds ago      249MB
<none>              <none>              49cef483e732        21 seconds ago      248MB
<none>              <none>              c4c92c39f2a4        23 seconds ago      246MB
centos              latest              0d120b6ccaa8        3 weeks ago         215MB

I see that each next docker image layer for "doing nothing" adds ~2Mb. (I don't know about ~24 Mb that was in OP question)

UPDATE 2

By advice from emix: Using dive I immediately found files that was changed with every layer in /var/rpm and /var/log

like image 55
Alex Yu Avatar answered Oct 21 '22 02:10

Alex Yu