I read that docker works with layers, so when creating a container
with a Dockerfile
, you start with the base image, then subsequent commands run add a layer to the container, so if you save the state of that new container, you have a new image. There are a couple of things I'm wondering about this.
If I start from a Ubuntu
image, which is pretty big and bulky since its a complete OS, then I add a few tools to it and save this as a new image which I upload to the hub. If someone downloads my image, and they already have a Ubuntu image saved in their images folder
, does this mean they can skip downloading Ubuntu
since they already have the image? If so, how does this work when I modify parts of the original image, does Docker use its cached data to selectively apply those changes to the Ubuntu image
after it loads it?
2.) How do I update an image that I built by modifying the Dockerfile? I setup a simple django project with this Dockerfile
:
FROM python:3.5 ENV PYTHONBUFFERED 1 ENV APPLICATION_ROOT /app ENV APP_ENVIRONMENT L RUN mkdir -p $APPLICATION_ROOT WORKDIR $APPLICATION_ROOT ADD requirements.txt $APPLICATION_ROOT RUN pip install --upgrade pip RUN pip install -r requirements.txt ADD . $APPLICATION_ROOT
and used this to create the image in the beginning. So everytime I create a box, it loads all these environment variables
, if I rebuild the box completely it reinstalls the packages and all the extras. I need to add a new environment variable, so I added it to the bottom of the Dockerfile
, along with a test variable:
ENV COMPOSE_CONVERT_WINDOWS_PATHS 1 ENV TEST_ENV_VAR TEST
When I delete the container and the image, and build a new container, it all seems to go accordingly, it tells me that it creates the new Step 4 : ENV
COMPOSE_CONVERT_WINDOWS_PATHS 1 ---> Running in 75551ea311b2 ---> b25b60e29f18 Removing intermediate container 75551ea311b2
So its like something gets lost in some of these intermediate container transitions. Is this how the caching system works, every new layer is an intermediate container
? So with that in mind, how do you add a new layer, do you always have to add the new data at the bottom of the Dockerfile? Or would it be better to leave the Dockerfile alone once the image is built, and just modify the container
and built a new image?
EDIT I just tried installing an image, a package called bwawrik/bioinformatics
, which is a CentOS based container which has a wide range of tools installed.
It froze half way through, so I exited it and then ran it again to see if everything was installed:
$ docker pull bwawrik/bioinformatics Using default tag: latest latest: Pulling from bwawrik/bioinformatics a3ed95caeb02: Already exists a3ed95caeb02: Already exists 7e78dbe53fdd: Already exists ebcc98113eaa: Already exists 598d3c8fd678: Already exists 12520d1e1960: Already exists 9b4912d2bc7b: Already exists c64f941884ae: Already exists 24371a4298bf: Already exists 993de48846f3: Already exists 2231b3c00b9e: Already exists 2d67c793630d: Already exists d43673e70e8e: Already exists fe4f50dda611: Already exists 33300f752b24: Already exists b4eec31201d8: Already exists f34092f697e8: Already exists e49521d8fb4f: Already exists 8349c93680fe: Already exists 929d44a7a5a1: Already exists 09a30957f0fb: Already exists 4611e742e0b5: Already exists 25aacf0148db: Already exists 74da82504b6c: Already exists 3e0aac083b86: Already exists f52c7e0ac000: Already exists 35eee92aaf2f: Already exists 5f6d8eb70885: Already exists 536920bfe266: Already exists 98638e678c51: Already exists 9123956b991d: Already exists 1c4c8a29cd65: Already exists 1804bf352a97: Already exists aa6fe9359956: Already exists e7e38d1250a9: Already exists 05e935c831dc: Already exists b7dfc22c26f3: Already exists 1514d4797ffd: Already exists Digest: sha256:0391808e21b7b5cc0eb44fc2dad0d7f5415115bdaafb4534c0b6a12efd47a88b Status: Image is up to date for bwawrik/bioinformatics:latest
So it definitely installed the package in pieces, not all in one go. Are these pieces, different images?
To upgrade Docker Engine, first run sudo apt-get update , then follow the installation instructions, choosing the new version you want to install.
We found that while updates to base images came in around once a month, non-OS images updated much more frequently – up to eight to ten updates in the case of the widely used node:latest and php:latest packages.
To push an image to Docker Hub, you must first name your local image using your Docker Hub username and the repository name that you created through Docker Hub on the web. You can add multiple images to a repository by adding a specific :<tag> to them (for example docs/base:testing ).
First, let me clarify some terminology.
image: A static, immutable object. This is the thing you build when you run docker build
using a Dockerfile
. An image is not a thing that runs.
Images are composed of layers. an image might have only one layer, or it might have many layers.
container: A running thing. It uses an image as its starting template.
This is similar to a binary program and a process. You have a binary program on disk (such as /bin/sh
), and when you run it, it is a process on your system. This is similar to the relationship between images and containers.
You can build your own image from a base image (such as ubuntu
in your example). Some commands in your Dockerfile
will create a new layer in the ultimate image. Some of those are RUN
, COPY
, and ADD
.
The very first layer has no parent layer. But every other layer will have a parent layer. In this way they link to one another, stacking up like pancakes.
Each layer has a unique ID (the long hexadecimal hashes you have already seen). They can also have human-friendly names, known as tags (e.g. ubuntu:16.04
).
Technically, each layer is also an image. If you build a new image and it has 5 layers, you can use that image and it will contain all 5 layers. If you run a container using the third layer in the stack as your image ID, you can do that too - but it would only contain 3 layers. The one you specify and the two that are its ancestors.
But as a matter of convention, the term "image" generally means the layer that has a tag associated. When you run docker images
, it will show you all of the top-level images, and hide the layers beneath (but you can show them all with -a
).
When docker build
runs, it does all of its work inside of containers (naturally!) So if it encounters a RUN
step, it will create a container from the current top layer, run the specified commands in there, and then save the result as a new layer. Then it will create a container from this new layer, run the next thing... etc.
The intermediate containers are only used for the build process, and are discarded after the build.
You asked whether someone downloading your ubuntu
-based image are only doing a partial download, if they already had the ubuntu
image locally.
Yes! That's exactly right.
Every layer uses the layer beneath it as a base. The new layer is basically a diff between that layer and a new state. It's not a diff in the same way as a git commit might work, though. It works at the file level, not at a the line level.
Say you started from ubuntu
, and you ran this Dockerfile.
FROM: ubuntu:16.04 RUN groupadd dan && useradd -g dan dan
This would result in a two layer image. The first layer would be the ubuntu
image. The second would probably have only a handful of changes.
/etc/passwd
with user "dan"/etc/group
with group "dan"/home/dan
/home/dan/.bashrc
And that's it. If you start a container from this image, those few files would be in the topmost layer, and everything else would come from the filesystem in the ubuntu
image.
One other point. When you run a container, you can write files in the filesystem. But if you stop the container and run another container from the same image, everything is reset. So where are the files written?
Images are immutable, so once they are created, they can't be changed. You can build a new version, but that's a new image. It would have a different ID and would not be the same image.
A container has a top-level read-write layer which is put on top of the image layers. Any writes happen in that layer. It works just like the other layers. If you need to modify a file (or add one, or delete one), that is done in the top layer, and doesn't affect the lower layers. If the file exists already, it is copied into the read-write layer, and then modified. This is known as copy-on-write (CoW).
Do you have to add new things to the bottom of Dockerfile? No, you can add anything anywhere (or change anything).
However, how you do things does affect your build times because of how the build caching works.
Docker will try to cache results during builds. If it finds as it reads through Dockerfile that the FROM
is the same, the first RUN
is the same, the second RUN
is the same... it will assume it has already done those steps, and will use cached results. If it encounters something that is different from the last build, it will invalidate the cache. Everything from that point on will be re-run fresh.
Some things will always invalidate the cache. For instance if you use ADD
or COPY
, those always invalidate the cache. That's because Docker only keeps track of what the build commands are. It doesn't try to figure out "is this version of the file I'm copying the same one as last time?"
So it is a common practice to start with FROM
, then put very static things like RUN
commands that install packages with e.g. apt-get
, etc. Those things tend to not change a lot after your Dockerfile has been initially written. Later in the file is a more convenient place to put things that change more often.
It's hard to concisely give good advice on this, because it really depends on the project in question. But it pays to learn how the build caching works and try to take advantage of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With