I read that docker works with layers, so when creating a <code>container</code> with a <code>Dockerfile</code>, you start with the base image, then subsequent commands run add a layer to the container, so if you save the state of that new container, you have a new image. There are a couple of things I'm wondering about this. If I start from a <code>Ubuntu</code> image, which is pretty big and bulky since its a complete OS, then I add a few tools to it and save this as a new image which I upload to the hub. If someone downloads my image, and they already have a Ubuntu image saved in their <code>images folder</code>, does this mean they can skip downloading <code>Ubuntu</code> since they already have the image? If so, how does this work when I modify parts of the original image, does Docker use its cached data to selectively apply those changes to the <code>Ubuntu image</code> after it loads it? 2.) How do I update an image that I built by modifying the Dockerfile? I setup a simple django project with this <code>Dockerfile</code>: <pre class="prettyprint"><code>FROM python:3.5 ENV PYTHONBUFFERED 1 ENV APPLICATION_ROOT /app ENV APP_ENVIRONMENT L RUN mkdir -p $APPLICATION_ROOT WORKDIR $APPLICATION_ROOT ADD requirements.txt $APPLICATION_ROOT RUN pip install --upgrade pip RUN pip install -r requirements.txt ADD . $APPLICATION_ROOT </code></pre> and used this to create the image in the beginning. So everytime I create a box, it loads all these <code>environment variables</code>, if I rebuild the box completely it reinstalls the packages and all the extras. I need to add a new environment variable, so I added it to the bottom of the <code>Dockerfile</code>, along with a test variable: <pre class="prettyprint"><code>ENV COMPOSE_CONVERT_WINDOWS_PATHS 1 ENV TEST_ENV_VAR TEST </code></pre> When I delete the container and the image, and build a new container, it all seems to go accordingly, it tells me that it creates the new Step 4 : ENV <pre class="prettyprint"><code>COMPOSE_CONVERT_WINDOWS_PATHS 1 ---> Running in 75551ea311b2 ---> b25b60e29f18 Removing intermediate container 75551ea311b2 </code></pre> So its like something gets lost in some of these intermediate container transitions. Is this how the caching system works, every new layer is an <code>intermediate container</code>? So with that in mind, how do you add a new layer, do you always have to add the new data at the bottom of the Dockerfile? Or would it be better to leave the Dockerfile alone once the image is built, and just modify the <code>container</code> and built a new image? EDIT I just tried installing an image, a package called <code>bwawrik/bioinformatics</code>, which is a CentOS based container which has a wide range of tools installed. It froze half way through, so I exited it and then ran it again to see if everything was installed: <pre class="prettyprint"><code>$ docker pull bwawrik/bioinformatics Using default tag: latest latest: Pulling from bwawrik/bioinformatics a3ed95caeb02: Already exists a3ed95caeb02: Already exists 7e78dbe53fdd: Already exists ebcc98113eaa: Already exists 598d3c8fd678: Already exists 12520d1e1960: Already exists 9b4912d2bc7b: Already exists c64f941884ae: Already exists 24371a4298bf: Already exists 993de48846f3: Already exists 2231b3c00b9e: Already exists 2d67c793630d: Already exists d43673e70e8e: Already exists fe4f50dda611: Already exists 33300f752b24: Already exists b4eec31201d8: Already exists f34092f697e8: Already exists e49521d8fb4f: Already exists 8349c93680fe: Already exists 929d44a7a5a1: Already exists 09a30957f0fb: Already exists 4611e742e0b5: Already exists 25aacf0148db: Already exists 74da82504b6c: Already exists 3e0aac083b86: Already exists f52c7e0ac000: Already exists 35eee92aaf2f: Already exists 5f6d8eb70885: Already exists 536920bfe266: Already exists 98638e678c51: Already exists 9123956b991d: Already exists 1c4c8a29cd65: Already exists 1804bf352a97: Already exists aa6fe9359956: Already exists e7e38d1250a9: Already exists 05e935c831dc: Already exists b7dfc22c26f3: Already exists 1514d4797ffd: Already exists Digest: sha256:0391808e21b7b5cc0eb44fc2dad0d7f5415115bdaafb4534c0b6a12efd47a88b Status: Image is up to date for bwawrik/bioinformatics:latest </code></pre> So it definitely installed the package in pieces, not all in one go. Are these pieces, different images?

<h3>image vs. container</h3> First, let me clarify some terminology. image: A static, immutable object. This is the thing you build when you run <code>docker build</code> using a <code>Dockerfile</code>. An image is not a thing that runs. Images are composed of layers. an image might have only one layer, or it might have many layers. container: A running thing. It uses an image as its starting template. This is similar to a binary program and a process. You have a binary program on disk (such as <code>/bin/sh</code>), and when you run it, it is a process on your system. This is similar to the relationship between images and containers. <h3>Adding layers to a base image</h3> You can build your own image from a base image (such as <code>ubuntu</code> in your example). Some commands in your <code>Dockerfile</code> will create a new layer in the ultimate image. Some of those are <code>RUN</code>, <code>COPY</code>, and <code>ADD</code>. The very first layer has no parent layer. But every other layer will have a parent layer. In this way they link to one another, stacking up like pancakes. Each layer has a unique ID (the long hexadecimal hashes you have already seen). They can also have human-friendly names, known as tags (e.g. <code>ubuntu:16.04</code>). <h3>What is a layer vs. an image?</h3> Technically, each layer is also an image. If you build a new image and it has 5 layers, you can use that image and it will contain all 5 layers. If you run a container using the third layer in the stack as your image ID, you can do that too - but it would only contain 3 layers. The one you specify and the two that are its ancestors. But as a matter of convention, the term "image" generally means the layer that has a tag associated. When you run <code>docker images</code>, it will show you all of the top-level images, and hide the layers beneath (but you can show them all with <code>-a</code>). <h3>What is an intermediate container?</h3> When <code>docker build</code> runs, it does all of its work inside of containers (naturally!) So if it encounters a <code>RUN</code> step, it will create a container from the current top layer, run the specified commands in there, and then save the result as a new layer. Then it will create a container from this new layer, run the next thing... etc. The intermediate containers are only used for the build process, and are discarded after the build. <h3>How layer filesystems work</h3> You asked whether someone downloading your <code>ubuntu</code>-based image are only doing a partial download, if they already had the <code>ubuntu</code> image locally. Yes! That's exactly right. Every layer uses the layer beneath it as a base. The new layer is basically a diff between that layer and a new state. It's not a diff in the same way as a git commit might work, though. It works at the file level, not at a the line level. Say you started from <code>ubuntu</code>, and you ran this Dockerfile. <pre class="prettyprint lang-none prettyprint-override"><code>FROM: ubuntu:16.04 RUN groupadd dan && useradd -g dan dan </code></pre> This would result in a two layer image. The first layer would be the <code>ubuntu</code> image. The second would probably have only a handful of changes. <ul> <li>A newer copy of <code>/etc/passwd</code> with user "dan"</li> <li>A newer copy of <code>/etc/group</code> with group "dan"</li> <li>A new directory <code>/home/dan</code> </li> <li>A couple of default files like <code>/home/dan/.bashrc</code> </li> </ul> And that's it. If you start a container from this image, those few files would be in the topmost layer, and everything else would come from the filesystem in the <code>ubuntu</code> image. <h3>The top-most read-write layer in a container</h3> One other point. When you run a container, you can write files in the filesystem. But if you stop the container and run another container from the same image, everything is reset. So where are the files written? Images are immutable, so once they are created, they can't be changed. You can build a new version, but that's a new image. It would have a different ID and would not be the same image. A container has a top-level read-write layer which is put on top of the image layers. Any writes happen in that layer. It works just like the other layers. If you need to modify a file (or add one, or delete one), that is done in the top layer, and doesn't affect the lower layers. If the file exists already, it is copied into the read-write layer, and then modified. This is known as copy-on-write (CoW). <h3>Where to add changes</h3> Do you have to add new things to the bottom of Dockerfile? No, you can add anything anywhere (or change anything). However, how you do things does affect your build times because of how the build caching works. Docker will try to cache results during builds. If it finds as it reads through Dockerfile that the <code>FROM</code> is the same, the first <code>RUN</code> is the same, the second <code>RUN</code> is the same... it will assume it has already done those steps, and will use cached results. If it encounters something that is different from the last build, it will invalidate the cache. Everything from that point on will be re-run fresh. Some things will always invalidate the cache. For instance if you use <code>ADD</code> or <code>COPY</code>, those always invalidate the cache. That's because Docker only keeps track of what the build commands are. It doesn't try to figure out "is this version of the file I'm copying the same one as last time?" So it is a common practice to start with <code>FROM</code>, then put very static things like <code>RUN</code> commands that install packages with e.g. <code>apt-get</code>, etc. Those things tend to not change a lot after your Dockerfile has been initially written. Later in the file is a more convenient place to put things that change more often. It's hard to concisely give good advice on this, because it really depends on the project in question. But it pays to learn how the build caching works and try to take advantage of it.

How do I update docker images?

Tags:

I read that docker works with layers, so when creating a container with a Dockerfile, you start with the base image, then subsequent commands run add a layer to the container, so if you save the state of that new container, you have a new image. There are a couple of things I'm wondering about this.

If I start from a Ubuntu image, which is pretty big and bulky since its a complete OS, then I add a few tools to it and save this as a new image which I upload to the hub. If someone downloads my image, and they already have a Ubuntu image saved in their images folder, does this mean they can skip downloading Ubuntu since they already have the image? If so, how does this work when I modify parts of the original image, does Docker use its cached data to selectively apply those changes to the Ubuntu image after it loads it?

2.) How do I update an image that I built by modifying the Dockerfile? I setup a simple django project with this Dockerfile:

FROM python:3.5  ENV PYTHONBUFFERED 1 ENV APPLICATION_ROOT /app ENV APP_ENVIRONMENT L  RUN mkdir -p $APPLICATION_ROOT WORKDIR $APPLICATION_ROOT ADD requirements.txt $APPLICATION_ROOT RUN pip install --upgrade pip RUN pip install -r requirements.txt ADD . $APPLICATION_ROOT

and used this to create the image in the beginning. So everytime I create a box, it loads all these environment variables, if I rebuild the box completely it reinstalls the packages and all the extras. I need to add a new environment variable, so I added it to the bottom of the Dockerfile, along with a test variable:

ENV COMPOSE_CONVERT_WINDOWS_PATHS 1 ENV TEST_ENV_VAR TEST

When I delete the container and the image, and build a new container, it all seems to go accordingly, it tells me that it creates the new Step 4 : ENV

COMPOSE_CONVERT_WINDOWS_PATHS 1  ---> Running in 75551ea311b2  ---> b25b60e29f18 Removing intermediate container 75551ea311b2

So its like something gets lost in some of these intermediate container transitions. Is this how the caching system works, every new layer is an intermediate container? So with that in mind, how do you add a new layer, do you always have to add the new data at the bottom of the Dockerfile? Or would it be better to leave the Dockerfile alone once the image is built, and just modify the container and built a new image?

EDIT I just tried installing an image, a package called bwawrik/bioinformatics, which is a CentOS based container which has a wide range of tools installed.

It froze half way through, so I exited it and then ran it again to see if everything was installed:

$ docker pull bwawrik/bioinformatics Using default tag: latest latest: Pulling from bwawrik/bioinformatics  a3ed95caeb02: Already exists a3ed95caeb02: Already exists 7e78dbe53fdd: Already exists ebcc98113eaa: Already exists 598d3c8fd678: Already exists 12520d1e1960: Already exists 9b4912d2bc7b: Already exists c64f941884ae: Already exists 24371a4298bf: Already exists 993de48846f3: Already exists 2231b3c00b9e: Already exists 2d67c793630d: Already exists d43673e70e8e: Already exists fe4f50dda611: Already exists 33300f752b24: Already exists b4eec31201d8: Already exists f34092f697e8: Already exists e49521d8fb4f: Already exists 8349c93680fe: Already exists 929d44a7a5a1: Already exists 09a30957f0fb: Already exists 4611e742e0b5: Already exists 25aacf0148db: Already exists 74da82504b6c: Already exists 3e0aac083b86: Already exists f52c7e0ac000: Already exists 35eee92aaf2f: Already exists 5f6d8eb70885: Already exists 536920bfe266: Already exists 98638e678c51: Already exists 9123956b991d: Already exists 1c4c8a29cd65: Already exists 1804bf352a97: Already exists aa6fe9359956: Already exists e7e38d1250a9: Already exists 05e935c831dc: Already exists b7dfc22c26f3: Already exists 1514d4797ffd: Already exists Digest: sha256:0391808e21b7b5cc0eb44fc2dad0d7f5415115bdaafb4534c0b6a12efd47a88b Status: Image is up to date for bwawrik/bioinformatics:latest

So it definitely installed the package in pieces, not all in one go. Are these pieces, different images?

854

asked Jan 12 '17 01:01

Horse O'Houlihan

1 Answers

image vs. container

First, let me clarify some terminology.

image: A static, immutable object. This is the thing you build when you run docker build using a Dockerfile. An image is not a thing that runs.

Images are composed of layers. an image might have only one layer, or it might have many layers.

container: A running thing. It uses an image as its starting template.

This is similar to a binary program and a process. You have a binary program on disk (such as /bin/sh), and when you run it, it is a process on your system. This is similar to the relationship between images and containers.

Adding layers to a base image

You can build your own image from a base image (such as ubuntu in your example). Some commands in your Dockerfile will create a new layer in the ultimate image. Some of those are RUN, COPY, and ADD.

The very first layer has no parent layer. But every other layer will have a parent layer. In this way they link to one another, stacking up like pancakes.

Each layer has a unique ID (the long hexadecimal hashes you have already seen). They can also have human-friendly names, known as tags (e.g. ubuntu:16.04).

What is a layer vs. an image?

Technically, each layer is also an image. If you build a new image and it has 5 layers, you can use that image and it will contain all 5 layers. If you run a container using the third layer in the stack as your image ID, you can do that too - but it would only contain 3 layers. The one you specify and the two that are its ancestors.

But as a matter of convention, the term "image" generally means the layer that has a tag associated. When you run docker images, it will show you all of the top-level images, and hide the layers beneath (but you can show them all with -a).

What is an intermediate container?

When docker build runs, it does all of its work inside of containers (naturally!) So if it encounters a RUN step, it will create a container from the current top layer, run the specified commands in there, and then save the result as a new layer. Then it will create a container from this new layer, run the next thing... etc.

The intermediate containers are only used for the build process, and are discarded after the build.

How layer filesystems work

You asked whether someone downloading your ubuntu-based image are only doing a partial download, if they already had the ubuntu image locally.

Yes! That's exactly right.

Every layer uses the layer beneath it as a base. The new layer is basically a diff between that layer and a new state. It's not a diff in the same way as a git commit might work, though. It works at the file level, not at a the line level.

Say you started from ubuntu, and you ran this Dockerfile.

FROM: ubuntu:16.04 RUN groupadd dan && useradd -g dan dan

This would result in a two layer image. The first layer would be the ubuntu image. The second would probably have only a handful of changes.

A newer copy of /etc/passwd with user "dan"
A newer copy of /etc/group with group "dan"
A new directory /home/dan
A couple of default files like /home/dan/.bashrc

And that's it. If you start a container from this image, those few files would be in the topmost layer, and everything else would come from the filesystem in the ubuntu image.

The top-most read-write layer in a container

One other point. When you run a container, you can write files in the filesystem. But if you stop the container and run another container from the same image, everything is reset. So where are the files written?

Images are immutable, so once they are created, they can't be changed. You can build a new version, but that's a new image. It would have a different ID and would not be the same image.

A container has a top-level read-write layer which is put on top of the image layers. Any writes happen in that layer. It works just like the other layers. If you need to modify a file (or add one, or delete one), that is done in the top layer, and doesn't affect the lower layers. If the file exists already, it is copied into the read-write layer, and then modified. This is known as copy-on-write (CoW).

Where to add changes

Do you have to add new things to the bottom of Dockerfile? No, you can add anything anywhere (or change anything).

However, how you do things does affect your build times because of how the build caching works.

Docker will try to cache results during builds. If it finds as it reads through Dockerfile that the FROM is the same, the first RUN is the same, the second RUN is the same... it will assume it has already done those steps, and will use cached results. If it encounters something that is different from the last build, it will invalidate the cache. Everything from that point on will be re-run fresh.

Some things will always invalidate the cache. For instance if you use ADD or COPY, those always invalidate the cache. That's because Docker only keeps track of what the build commands are. It doesn't try to figure out "is this version of the file I'm copying the same one as last time?"

So it is a common practice to start with FROM, then put very static things like RUN commands that install packages with e.g. apt-get, etc. Those things tend to not change a lot after your Dockerfile has been initially written. Later in the file is a more convenient place to put things that change more often.

It's hard to concisely give good advice on this, because it really depends on the project in question. But it pays to learn how the build caching works and try to take advantage of it.

answered Oct 09 '22 13:10

Dan Lowe

Related questions
                            
                                Angular 4 and Ionic 3 No provider for HTTP
                            
                                Changing body styles in vue router
                            
                                Windows ISO 8601 timestamp
                            
                                java.lang.NoClassDefFoundError: Failed resolution of: Lcom/google/android/gms/common/R$string;
                            
                                Missing dependencies causing Keyring error when opening Spyder3 on Ubuntu18?
                            
                                Align two SwiftUI text views in HStack with correct alignment
                            
                                What is the best set of tools to develop Win32 Delphi applications? [closed]
                            
                                Best platform for learning embedded programming? [closed]
                            
                                Delphi: using TClientDataset as an in-memory dataset
                            
                                C# pass by value/ref?
                            
                                Append text to an attribute rather than replacing it?
                            
                                Implement binary search in objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With