Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huge files in Docker containers

Tags:

I need to create a Docker image (and consequently containers from that image) that use large files (containing genomic data, thus reaching ~10GB in size).

How am I supposed to optimize their usage? Am I supposed to include them in the container (such as COPY large_folder large_folder_in_container)? Is there a better way of referencing such files? The point is that it sounds strange to me to push such container (which would be >10GB) in my private repository. I wonder if there is a way of attaching a sort of volume to the container, without packing all those GBs together.

Thank you.

like image 625
Eleanore Avatar asked Sep 15 '16 11:09

Eleanore


People also ask

Why is my Docker file so big?

A Docker image takes up more space with every layer you add to it. Therefore, the more layers you have, the more space the image requires. Each RUN instruction in a Dockerfile adds a new layer to your image. That is why you should try to do file manipulation inside a single RUN command.

Do Docker containers have a size limit?

In the current Docker version, there is a default limitation on the Docker container storage of 10Gb.

How do I free up space on my Docker container?

A stopped container's writable layers still take up disk space. To clean this up, you can use the docker container prune command. By default, you are prompted to continue. To bypass the prompt, use the -f or --force flag.


1 Answers

Is there a better way of referencing such files?

If you already have some way to distribute the data I would use a "bind mount" to attach a volume to the containers.

docker run -v /path/to/data/on/host:/path/to/data/in/container <image> ...

That way you can change the image and you won't have to re-download the large data set each time.

If you wanted to use the registry to distribute the large data set, but want to manage changes to the data set separately, you could use a data volume container with a Dockerfile like this:

FROM tianon/true
COPY dataset /dataset
VOLUME /dataset

From your application container you can attach that volume using:

docker run -d --name dataset <data volume image name>
docker run --volumes-from dataset <image> ...

Either way, I think https://docs.docker.com/engine/tutorials/dockervolumes/ are what you want.

like image 176
dnephin Avatar answered Sep 19 '22 14:09

dnephin