I have been using Docker for some time while I don't have an IT background.
Now, I am making an effort to understand how the size of my docker images can be minimized by optimizing my Dockerfiles. In this sense, I run into a minimal reproducible case which I don't understand. I would be very glad if anyone could share his ideas or provide an explanation.
I start from an official centos:7 image (7e6257c9f8d8; 203MB). Then, I prepare the following Dockerfile:
FROM centos:7
RUN yum -y install nano && yum -y clean all && rm -fr /var/cache
RUN yum -y install which && yum -y clean all && rm -fr /var/cache
RUN yum -y install which && yum -y clean all && rm -fr /var/cache
The idea is to install whatever lightweight package and evaluate the impact on image size. For this I install nano first, followed by which in a different label. I add an extra attempt of installing which (this identifies there is nothing to do). Moreover, I add yum clean all statements for cleaning yum cache and, just in case (even though I just checked that the experiment result does not vary if I remove this command), I delete the /var/cache dir (this is empty in the base image).
The result is the following:
IMAGE CREATED CREATED BY SIZE
6a14537d3460 7 seconds ago /bin/sh -c yum -y install which && yum -y cl… 23.9MB
7d924cbdf819 22 seconds ago /bin/sh -c yum -y install which && yum -y cl… 24.2MB
2b5b04d37a64 42 seconds ago /bin/sh -c yum -y install nano && yum -y cle… 24.6MB
The installed size of which is 75k and the installed size of nano is 1.6M. I don't identify any additional installed dependencies.
The question is: Why each of these install commands increases the final image by a ~24MB layer even when no packages are actually installed?
Thanks in advance to the community :)
Each RUN
instruction creates new docker layer.
Docker itself is not that smart enough to detect that instruction actually did nothing.
It faithfully stores new docker layer in resulting image.
That's why you need to try to minimize amount of docker instructions if possible.
In your case you can use just one RUN
instructon:
RUN yum -y install nano which && yum -y clean all && rm -fr /var/cache
UPDATE
Let's make an experiment:
FROM centos
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
RUN yum -y install which
10 RUN instructions, 9 of them "doing nothing".
Let's build and look for intermediate images
$ docker build .
...
$ docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> fbd86aedc782 5 seconds ago 263MB
<none> <none> ca70a4bbe722 7 seconds ago 261MB
<none> <none> bd11e0ab02fb 9 seconds ago 259MB
<none> <none> 68c20ddfcaad 11 seconds ago 257MB
<none> <none> 314a6501ad23 13 seconds ago 255MB
<none> <none> 42a62294a5e7 16 seconds ago 253MB
<none> <none> 16fad39b9c27 18 seconds ago 251MB
<none> <none> 6769fe69c9e1 19 seconds ago 249MB
<none> <none> 49cef483e732 21 seconds ago 248MB
<none> <none> c4c92c39f2a4 23 seconds ago 246MB
centos latest 0d120b6ccaa8 3 weeks ago 215MB
I see that each next docker image layer for "doing nothing" adds ~2Mb. (I don't know about ~24 Mb that was in OP question)
UPDATE 2
By advice from emix: Using dive I immediately found files that was changed with every layer in /var/rpm
and /var/log
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With