First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion. I understand that when <code>VOLUME</code> is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example: <pre class="prettyprint"><code># Dockerfile VOLUME ["/foo"] </code></pre> This would create a volume to contain any data stored in <code>/foo</code> inside the container. The volume (when viewed via <code>docker volume ls</code>) would show up as a random jumble of numbers. Each time you do <code>docker run</code>, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings: <pre class="prettyprint"><code>#!/usr/bin/env bash # Run container for the first time docker run -t foo # Kill the container and re-run it again. Note that the previous # volume would now contain data because services running in `foo` # would have written data to that volume. docker container stop foo docker container rm foo # Run container a second time docker run -t foo </code></pre> I expect the unnamed volume to be reused between the 2 <code>run</code> commands. However, this is not the case. Because I did not explicitly map a volume via the <code>-v</code> option, a new volume is created for each <code>run</code>. Here's important part number 2: Since I'm required to explicitly specify <code>-v</code> to share persistent state between <code>run</code> commands, why would I ever specify <code>VOLUME</code> in my Dockerfile? Without <code>VOLUME</code>, I can do this (using the previous example): <pre class="prettyprint"><code>#!/usr/bin/env bash # Create a volume for state persistence docker volume create foo_data # Run container for the first time docker run -t -v foo_data:/foo foo # Kill the container and re-run it again. Note that the previous # volume would now contain data because services running in `foo` # would have written data to that volume. docker container stop foo docker container rm foo # Run container a second time docker run -t -v foo_data:/foo foo </code></pre> Now, truly, the second container will have data mounted to <code>/foo</code> that was there from the previous instance. I can do this without <code>VOLUME</code> in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker. So my question is: What is the point of <code>VOLUME</code> when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated. Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.

Instructions like <code>VOLUME</code> and <code>EXPOSE</code> are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago. Before Docker 1.9, running a container whose image had one or more <code>VOLUME</code> instructions (or using the <code>--volume</code> option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the <code>--volumes-from</code> option. Here's some articles that describe this outdated pattern. <ul> <li>Docker Data Containers</li> <li>Why Docker Data Containers (Volumes!) are Good</li> </ul> Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed. Today, I consider the <code>VOLUME</code> instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a <code>VOLUME</code> at <code>/var/lib/postgresql/data</code>. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at <code>/var/lib/postgresql/data</code>. However, the <code>VOLUME</code> instruction does come at a cost. <ul> <li>Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.</li> <li>There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.</li> </ul> The latter issue results in problems like these. <ul> <li>How to “undeclare” volumes in docker image?</li> <li>GitLab on Docker: how to persist user data between deployments?</li> </ul> For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image. tl;dr: <code>VOLUME</code> was designed for a world before Docker 1.9. Best to just leave it out.

What is the practical purpose of VOLUME in Dockerfile?

Tags:

docker

dockerfile

First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.

I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:

# Dockerfile
VOLUME ["/foo"]

This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.

Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:

#!/usr/bin/env bash
# Run container for the first time
docker run -t foo

# Kill the container and re-run it again. Note that the previous 
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo

# Run container a second time
docker run -t foo

I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.

Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):

#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data

# Run container for the first time
docker run -t -v foo_data:/foo foo

# Kill the container and re-run it again. Note that the previous 
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo

# Run container a second time
docker run -t -v foo_data:/foo foo

Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.

So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.

Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.

588

asked Sep 29 '18 15:09

void.pointer

1 Answers

Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.

Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.

Docker Data Containers
Why Docker Data Containers (Volumes!) are Good

Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.

Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.

However, the VOLUME instruction does come at a cost.

Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.

The latter issue results in problems like these.

How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?

For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.

tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.

117

answered Oct 11 '22 02:10

King Chung Huang

Related questions
                            
                                Custom Docker image doesn't inherit CMD
                            
                                How to cache multi-stage docker build in google cloud builder
                            
                                Docker pull “unexpected EOF”
                            
                                How to reference another environment variable inside an env file used by .devcontainer running inside a Visual Studio Code docker container?
                            
                                Start Docker container using systemd socket activation?
                            
                                Docker: reattach to `docker exec` process
                            
                                docker "Couldn't find an alternative telinit implementation to spawn"
                            
                                how to reach another container from a dockerised nginx
                            
                                Using Docker compose within Google App Engine
                            
                                How to start service only when other service had completed?
                            
                                How to configure custom domain name for Amazon ECR
                            
                                Docker compose volume Permissions linux
                            
                                How to tell docker to use host dns configuration?
                            
                                Restart postgres in a docker environment
                            
                                tcpdump reports error in Docker container that's started with --privileged
                            
                                Store Docker image files on external drive in macOS
                            
                                Passing a JSON file as environment variable in Docker
                            
                                kibana running on docker: how to save dashboard?
                            
                                Docker compose v3: The difference between volume type mount and bind
                            
                                Why is Docker Secrets more secure than environment variables?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With