My service needs some large files when it is running (~ 100MB-500MB) These files might change once in a while, and I don't mind to rebuild my container and re-deploy it when it happens. I'm wondering what is the best way to store it and use it during the build so anyone in the team can update the container and rebuild it. My best idea so far is to store these large files in git LFS in a different branch for each version. So that I can add it to my Dockerfile: <pre class="prettyprint"><code>RUN git clone -b 'version_2.0' --single-branch --depth 1 https://...git.git </code></pre> This way, if these large files change, I just need to change the <code>version_2.0</code> in the Dockerfile, and rebuild. Is there any other recommended way? I considered storing these files in Dropbox, and just get them with a link using <code>wget</code> during build P.S - These large files are the weights for some Deep-Network Edit - The question is what a reasonable way to store large files in a docker, such that one developer/team can change the file and matching code, and it will be documented (git) and can easily be used and even deployed by another team (for this reason, just large files on the local PC ir bad, because it needs to be sent to another team)

It actually comes down to how you build your container, For example we build our containers using Jenkins & fabric8 io plugin as part of maven build. We use ADD with remote source url (Nexus). In general , you can use a URL as source. so it depends which storage you have access to. 1. you can create an s3 bucket and provide access to your docker builder node . You can add <code>ADD http://example.com/big.tar.xz /usr/src/things/</code> in your docker file to build <ol start="2"> <li>you can upload the large files into artifact repository (Such as Nexus or Artifactory) and use it in ADD</li> <li>if you're building using Jenkins, in the same host create a folder and configure the webserver to serve that content with a virtualhost config. Then use that Url.</li> </ol> Optimal solution would be the one which is cheaper in terms of effort and cost without compromising on security.

Adding large files to docker during build

Tags:

git

docker

dockerfile

git-lfs

My service needs some large files when it is running (~ 100MB-500MB) These files might change once in a while, and I don't mind to rebuild my container and re-deploy it when it happens.

I'm wondering what is the best way to store it and use it during the build so anyone in the team can update the container and rebuild it.

My best idea so far is to store these large files in git LFS in a different branch for each version. So that I can add it to my Dockerfile:

Click to copy

RUN git clone -b 'version_2.0' --single-branch --depth 1 https://...git.git

This way, if these large files change, I just need to change the version_2.0 in the Dockerfile, and rebuild.

Is there any other recommended way? I considered storing these files in Dropbox, and just get them with a link using wget during build

P.S - These large files are the weights for some Deep-Network

Edit - The question is what a reasonable way to store large files in a docker, such that one developer/team can change the file and matching code, and it will be documented (git) and can easily be used and even deployed by another team (for this reason, just large files on the local PC ir bad, because it needs to be sent to another team)

291

asked Feb 01 '19 12:02

user972014

3 Answers

It actually comes down to how you build your container, For example we build our containers using Jenkins & fabric8 io plugin as part of maven build. We use ADD with remote source url (Nexus).

In general , you can use a URL as source. so it depends which storage you have access to. 1. you can create an s3 bucket and provide access to your docker builder node . You can add ADD http://example.com/big.tar.xz /usr/src/things/ in your docker file to build

you can upload the large files into artifact repository (Such as Nexus or Artifactory) and use it in ADD
if you're building using Jenkins, in the same host create a folder and configure the webserver to serve that content with a virtualhost config. Then use that Url.

Optimal solution would be the one which is cheaper in terms of effort and cost without compromising on security.

answered Oct 19 '22 21:10

Prasanna P

I feel that I must be misreading your question, because the answer seems blindingly obvious to me, but none of the other respondents are mentioning it. So please kindly forgive me if I am vastly misinterpreting your problem.

If your service needs large files when running and they change from time to time, then

do not include them in the image; but instead
mount them as volumes,

answered Oct 19 '22 19:10

emory

These files might change once in a while, and I don't mind to rebuild my container and re-deploy it when it happens.

Then a source control is not the best fit for such artifact.

A binary artifact storage service, like Nexus or Artifactory (which both have free editions, and have their own docker image if you need one) is more suited to this task.

From there, your Dockerfile can fetch from Nexus/Artifactory your large file(s).
See here for proper caching and cache invalidation.

answered Oct 19 '22 19:10

VonC

Related questions
                            
                                How to update repository location in Git Tower
                            
                                How to use multiple remotes with GitKraken
                            
                                Can I set default security and/or branch policies in Azure DevOps?
                            
                                Linux latest stable compilation: cannot represent change to vmlinux-gdb.py:
                            
                                Git-SVN clear auth-cache
                            
                                Can I get a list of changed files in a git sub-folder between two commits?
                            
                                With git, temporary exclude a changed tracked file from commit in command line
                            
                                Best practice for ignoring files within a folder with GIT
                            
                                File Permission issues with sharing a GIT Remote Repository
                            
                                switching a subdirectory managed by git to a submodule
                            
                                How to do git merge/pull correctly: You have not concluded your merge (MERGE_HEAD exists)
                            
                                What are all available Git special files that can be committed to a repository?
                            
                                Avoid git-ftp upload if files already on server
                            
                                fatal: Not a git repository (or any of the parent directories): .git on every rails command but git commands works fine
                            
                                Automatically open a pull request on github by command line
                            
                                Git keeps mentioning my old username even if I can't find a trace of it
                            
                                Git log - How to filter (exclude) files from appearing in `git log`? (git pathspec magic)
                            
                                How to rebase in IntelliJ IDEA?
                            
                                How to use git kraken for a force push
                            
                                How to build a Carthage framework from git branch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With