My service needs some large files when it is running (~ 100MB-500MB) These files might change once in a while, and I don't mind to rebuild my container and re-deploy it when it happens.
I'm wondering what is the best way to store it and use it during the build so anyone in the team can update the container and rebuild it.
My best idea so far is to store these large files in git LFS in a different branch for each version. So that I can add it to my Dockerfile:
RUN git clone -b 'version_2.0' --single-branch --depth 1 https://...git.git
This way, if these large files change, I just need to change the version_2.0
in the Dockerfile, and rebuild.
Is there any other recommended way? I considered storing these files in Dropbox, and just get them with a link using wget
during build
P.S - These large files are the weights for some Deep-Network
Edit - The question is what a reasonable way to store large files in a docker, such that one developer/team can change the file and matching code, and it will be documented (git) and can easily be used and even deployed by another team (for this reason, just large files on the local PC ir bad, because it needs to be sent to another team)
When building an image, you can't mount a volume. However, you can copy data from another image! By combining this, with a multi-stage build, you can pre-compute an expensive operation once, and re-use the resulting state as a starting point for future iterations.
In the current Docker version, there is a default limitation on the Docker container storage of 10Gb.
Docker Container Memory Limits - Set global memory limit By default, the container can swap the same amount of assigned memory, which means that the overall hard limit would be around 256m when you set --memory 128m .
This document covers recommended best practices and methods for building efficient images. Docker builds images automatically by reading the instructions from a Dockerfile -- a text file that contains all commands, in order, needed to build a given image.
The build command, like other docker commands, is client/server based. The build may run on a remote server with no access to the machine running the docker command. To support copying files into your image, you pass the build context as the last argument to the build command.
The directory may contain many other unrelated files and directories. So, this may require Docker to scan a lot of resources, which can cause the build process to be slow. 4. Create a Base Image With the External Files Another approach is to create a base image with the external files and extend it afterward.
Each instruction creates one layer: 1 FROM creates a layer from the ubuntu:18.04 Docker image. 2 COPY adds files from your Docker client’s current directory. 3 RUN builds your application with make. 4 CMD specifies what command to run within the container.
It actually comes down to how you build your container, For example we build our containers using Jenkins & fabric8 io plugin as part of maven build. We use ADD with remote source url (Nexus).
In general , you can use a URL as source. so it depends which storage you have access to.
1. you can create an s3 bucket and provide access to your docker builder node . You can add ADD http://example.com/big.tar.xz /usr/src/things/
in your docker file to build
you can upload the large files into artifact repository (Such as Nexus or Artifactory) and use it in ADD
if you're building using Jenkins, in the same host create a folder and configure the webserver to serve that content with a virtualhost config. Then use that Url.
Optimal solution would be the one which is cheaper in terms of effort and cost without compromising on security.
I feel that I must be misreading your question, because the answer seems blindingly obvious to me, but none of the other respondents are mentioning it. So please kindly forgive me if I am vastly misinterpreting your problem.
If your service needs large files when running and they change from time to time, then
These files might change once in a while, and I don't mind to rebuild my container and re-deploy it when it happens.
Then a source control is not the best fit for such artifact.
A binary artifact storage service, like Nexus or Artifactory (which both have free editions, and have their own docker image if you need one) is more suited to this task.
From there, your Dockerfile can fetch from Nexus/Artifactory your large file(s).
See here for proper caching and cache invalidation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With