I've been playing around with Docker for a while and keep on finding the same issue when dealing with persistent data.
I create my Dockerfile
and expose a volume or use --volumes-from
to mount a host folder inside my container.
What permissions should I apply to the shared volume on the host?
I can think of two options:
So far I've given everyone read/write access, so I can write to the folder from the Docker container.
Map the users from host into the container, so I can assign more granular permissions. Not sure this is possible though and haven't found much about it. So far, all I can do is run the container as some user: docker run -i -t -user="myuser" postgres
, but this user has a different UID than my host myuser
, so permissions do not work. Also, I'm unsure if mapping the users will pose some security risks.
Are there other alternatives?
How are you guys/gals dealing with this issue?
You can manage volumes using Docker CLI commands or the Docker API. Volumes work on both Linux and Windows containers. Volumes can be more safely shared among multiple containers. Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
Sharing and persisting data in Docker containers is handled by Docker Volumes. Docker Volumes can be created during container creation or created later and attached to containers. In this tutorial, we will be discussing the four different ways to share data between containers.
Volumes are stored in a part of the host filesystem which is managed by Docker ( /var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker. Bind mounts may be stored anywhere on the host system.
Docker volumes are used to persist data from within a Docker container. There are a few different types of Docker volumes: host, anonymous, and, named. Knowing what the difference is and when to use each type can be difficult, but hopefully, I can ease that pain here.
UPDATE 2016-03-02: As of Docker 1.9.0, Docker has named volumes which replace data-only containers. The answer below, as well as my linked blog post, still has value in the sense of how to think about data inside docker but consider using named volumes to implement the pattern described below rather than data containers.
I believe the canonical way to solve this is by using data-only containers. With this approach, all access to the volume data is via containers that use -volumes-from
the data container, so the host uid/gid doesn't matter.
For example, one use case given in the documentation is backing up a data volume. To do this another container is used to do the backup via tar
, and it too uses -volumes-from
in order to mount the volume. So I think the key point to grok is: rather than thinking about how to get access to the data on the host with the proper permissions, think about how to do whatever you need -- backups, browsing, etc. -- via another container. The containers themselves need to use consistent uid/gids, but they don't need to map to anything on the host, thereby remaining portable.
This is relatively new for me as well but if you have a particular use case feel free to comment and I'll try to expand on the answer.
UPDATE: For the given use case in the comments, you might have an image some/graphite
to run graphite, and an image some/graphitedata
as the data container. So, ignoring ports and such, the Dockerfile
of image some/graphitedata
is something like:
FROM debian:jessie # add our user and group first to make sure their IDs get assigned consistently, regardless of other deps added later RUN groupadd -r graphite \ && useradd -r -g graphite graphite RUN mkdir -p /data/graphite \ && chown -R graphite:graphite /data/graphite VOLUME /data/graphite USER graphite CMD ["echo", "Data container for graphite"]
Build and create the data container:
docker build -t some/graphitedata Dockerfile docker run --name graphitedata some/graphitedata
The some/graphite
Dockerfile should also get the same uid/gids, therefore it might look something like this:
FROM debian:jessie # add our user and group first to make sure their IDs get assigned consistently, regardless of other deps added later RUN groupadd -r graphite \ && useradd -r -g graphite graphite # ... graphite installation ... VOLUME /data/graphite USER graphite CMD ["/bin/graphite"]
And it would be run as follows:
docker run --volumes-from=graphitedata some/graphite
Ok, now that gives us our graphite container and associated data-only container with the correct user/group (note you could re-use the some/graphite
container for the data container as well, overriding the entrypoing/cmd when running it, but having them as separate images IMO is clearer).
Now, lets say you want to edit something in the data folder. So rather than bind mounting the volume to the host and editing it there, create a new container to do that job. Lets call it some/graphitetools
. Lets also create the appropriate user/group, just like the some/graphite
image.
FROM debian:jessie # add our user and group first to make sure their IDs get assigned consistently, regardless of other deps added later RUN groupadd -r graphite \ && useradd -r -g graphite graphite VOLUME /data/graphite USER graphite CMD ["/bin/bash"]
You could make this DRY by inheriting from some/graphite
or some/graphitedata
in the Dockerfile, or instead of creating a new image just re-use one of the existing ones (overriding entrypoint/cmd as necessary).
Now, you simply run:
docker run -ti --rm --volumes-from=graphitedata some/graphitetools
and then vi /data/graphite/whatever.txt
. This works perfectly because all the containers have the same graphite user with matching uid/gid.
Since you never mount /data/graphite
from the host, you don't care how the host uid/gid maps to the uid/gid defined inside the graphite
and graphitetools
containers. Those containers can now be deployed to any host, and they will continue to work perfectly.
The neat thing about this is that graphitetools
could have all sorts of useful utilities and scripts, that you can now also deploy in a portable manner.
UPDATE 2: After writing this answer, I decided to write a more complete blog post about this approach. I hope it helps.
UPDATE 3: I corrected this answer and added more specifics. It previously contained some incorrect assumptions about ownership and perms -- the ownership is usually assigned at volume creation time i.e. in the data container, because that is when the volume is created. See this blog. This is not a requirement though -- you can just use the data container as a "reference/handle" and set the ownership/perms in another container via chown in an entrypoint, which ends with gosu to run the command as the correct user. If anyone is interested in this approach, please comment and I can provide links to a sample using this approach.
A very elegant solution can be seen on the official redis image and in general in all official images.
Described in step-by-step process:
As seen on Dockerfile comments:
add our user and group first to make sure their IDs get assigned consistently, regardless of whatever dependencies get added
gosu is an alternative of su
/ sudo
for easy step-down from root user. (Redis is always run with redis
user)
/data
volume and set it as workdirBy configuring the /data volume with the VOLUME /data
command we now have a separate volume that can either be docker volume or bind-mounted to a host dir.
Configuring it as the workdir (WORKDIR /data
) makes it be the default directory where commands are executed from.
This means that all container executions will run through the docker-entrypoint script, and by default the command to be run is redis-server.
docker-entrypoint
is a script that does a simple function: Change ownership of current directory (/data) and step-down from root
to redis
user to run redis-server
. (If the executed command is not redis-server, it will run the command directly.)
This has the following effect
If the /data directory is bind-mounted to the host, the docker-entrypoint will prepare the user permissions before running redis-server under redis
user.
This gives you the ease-of-mind that there is zero-setup in order to run the container under any volume configuration.
Of course if you need to share the volume between different images you need to make sure they use the same userid/groupid otherwise the latest container will hijack the user permissions from the previous one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With