What is the best strategy to clone a private Git repository into a Docker container using a Dockerfile? Pros/Cons? I know that I can add commands on Dockerfile in order to clone my private repository into a docker container. But I would like to know which different approaches people have used on this case. It’s not covered in the Dockerfile best practices guide.

From Ryan Baumann's blog post “Git strategies for Docker” <blockquote> There are different strategies for getting your Git source code into a Docker build. Many of these have different ways of interacting with Docker’s caching mechanisms, and may be more or less appropriately suited to your project and how you intend to use Docker. RUN git clone If you’re like me, this is the approach that first springs to mind when you see the commands available to you in a Dockerfile. The trouble with this is that it can interact in several unintuitive ways with Docker’s build caching mechanisms. For example, if you make an update to your git repository, and then re-run the docker build which has a RUN git clone command, you may or may not get the new commit(s) depending on if the preceding Dockerfile commands have invalidated the cache. One way to get around this is to use docker build <code>--no-cache</code>, but then if there are any time-intensive commands preceding the clone they’ll have to run again too. Another issue is that you (or someone you’ve distributed your Dockerfile to) may unexpectedly come back to a broken build later on when the upstream git repository updates. A two-birds-one-stone approach to this while still using RUN git clone is to put it on one line1 with a specific revision checkout, e.g.: <pre class="prettyprint"><code>RUN git clone https://github.com/example/example.git && cd example && git checkout 0123abcdef </code></pre> Then updating the revision to check out in the Dockerfile will invalidate the cache at that line and cause the clone/checkout to run. One possible drawback to this approach in general is that you have to have git installed in your container. RUN curl or ADD a tag/commit tarball URL This avoids having to have git installed in your container environment, and can benefit from being explicit about when the cache will break (i.e. if the tag/revision is part of the URL, that URL change will bust the cache). Note that if you use the Dockerfile ADD command to copy from a remote URL, the file will be downloaded every time you run the build, and the HTTP Last-Modified header will also be used to invalidate the cache. You can see this approach used in the <a href="https://github.com/docker-library/golang/blob/1a422afd7db928a821e97906ed27ed606e2f072a/1.3/Dockerfile" rel="noreferrer">golang Dockerfile</a>. Git submodules inside Dockerfile repository If you keep your Dockerfile and Docker build in a separate repository from your source code, or your Docker build requires multiple source repositories, using git submodules (or git subtrees) in this repository may be a valid way to get your source repos into your build context. This avoids some concerns with Docker caching and upstream updating, as you lock the upstream revision in your submodule/subtree specification. Updating them will break your Docker cache as it changes the build context. Note that this only gets the files into your Docker build context, you still need to use ADD commands in your Dockerfile to copy those paths to where you expect them in the container. You can see this approach used in the here Dockerfile inside git repository Here, you just have your Dockerfile in the same git repository alongside the code you want to build/test/deploy, so it automatically gets sent as part of the build context, so you can e.g. ADD . /project to copy the context into the container. The advantage to this is that you can test changes without having to potentially commit/push them to get them into a test docker build; the disadvantage is that every time you modify any files in your working directory it will invalidate the cache at the ADD command. Sending the build context for a large source/data directory can also be time-consuming. So if you use this approach, you may also want to make judicious use of the <a href="http://docs.docker.com/engine/reference/builder/#dockerignore-file" rel="noreferrer">.dockerignore file</a>, including doing things like ignoring everything in your .gitignore and possibly the .git directory itself. Volume mapping If you’re using Docker to set up a dev/test environment that you want to share among a wide variety of source repos on your host machine, <a href="http://docs.docker.com/engine/userguide/dockervolumes/#mount-a-host-directory-as-a-data-volume" rel="noreferrer">mounting a host directory as a data volume</a> may be a viable strategy. This gives you the ability to specify which directories you want to include at docker run-time, and avoids concerns about docker build caching, but none of this will be shared among other users of your Dockerfile or container image. </blockquote>

Dockerfile strategies for Git

1 Answers

From Ryan Baumann's blog post “Git strategies for Docker”

There are different strategies for getting your Git source code into a Docker build. Many of these have different ways of interacting with Docker’s caching mechanisms, and may be more or less appropriately suited to your project and how you intend to use Docker.

RUN git clone

If you’re like me, this is the approach that first springs to mind when you see the commands available to you in a Dockerfile. The trouble with this is that it can interact in several unintuitive ways with Docker’s build caching mechanisms. For example, if you make an update to your git repository, and then re-run the docker build which has a RUN git clone command, you may or may not get the new commit(s) depending on if the preceding Dockerfile commands have invalidated the cache.

One way to get around this is to use docker build --no-cache, but then if there are any time-intensive commands preceding the clone they’ll have to run again too.

Another issue is that you (or someone you’ve distributed your Dockerfile to) may unexpectedly come back to a broken build later on when the upstream git repository updates.

A two-birds-one-stone approach to this while still using RUN git clone is to put it on one line1 with a specific revision checkout, e.g.:
RUN git clone https://github.com/example/example.git && cd example && git checkout 0123abcdef 
Then updating the revision to check out in the Dockerfile will invalidate the cache at that line and cause the clone/checkout to run.

One possible drawback to this approach in general is that you have to have git installed in your container.

RUN curl or ADD a tag/commit tarball URL

This avoids having to have git installed in your container environment, and can benefit from being explicit about when the cache will break (i.e. if the tag/revision is part of the URL, that URL change will bust the cache). Note that if you use the Dockerfile ADD command to copy from a remote URL, the file will be downloaded every time you run the build, and the HTTP Last-Modified header will also be used to invalidate the cache.

You can see this approach used in the golang Dockerfile.

Git submodules inside Dockerfile repository

If you keep your Dockerfile and Docker build in a separate repository from your source code, or your Docker build requires multiple source repositories, using git submodules (or git subtrees) in this repository may be a valid way to get your source repos into your build context. This avoids some concerns with Docker caching and upstream updating, as you lock the upstream revision in your submodule/subtree specification. Updating them will break your Docker cache as it changes the build context.

Note that this only gets the files into your Docker build context, you still need to use ADD commands in your Dockerfile to copy those paths to where you expect them in the container.

You can see this approach used in the here

Dockerfile inside git repository

Here, you just have your Dockerfile in the same git repository alongside the code you want to build/test/deploy, so it automatically gets sent as part of the build context, so you can e.g. ADD . /project to copy the context into the container. The advantage to this is that you can test changes without having to potentially commit/push them to get them into a test docker build; the disadvantage is that every time you modify any files in your working directory it will invalidate the cache at the ADD command. Sending the build context for a large source/data directory can also be time-consuming. So if you use this approach, you may also want to make judicious use of the .dockerignore file, including doing things like ignoring everything in your .gitignore and possibly the .git directory itself.

Volume mapping

If you’re using Docker to set up a dev/test environment that you want to share among a wide variety of source repos on your host machine, mounting a host directory as a data volume may be a viable strategy. This gives you the ability to specify which directories you want to include at docker run-time, and avoids concerns about docker build caching, but none of this will be shared among other users of your Dockerfile or container image.

answered Oct 04 '22 17:10

Hemerson Varela

Related questions
                            
                                Delete files from git index when they are already deleted from fs
                            
                                Cherry-Picking few commits from another branch
                            
                                Getting ssh-agent to work with git run from windows command shell
                            
                                git: Unable to index file - permission denied
                            
                                The working copy '<Project Name>' failed to commit files - Couldnt communicate with helper application
                            
                                Git error remote: Empty password
                            
                                Git Overwrite master with branch
                            
                                How does 'git log --graph' or 'hg graphlog' work?
                            
                                Comparing differences across a rebase in Git
                            
                                What's the difference "origin master" vs "origin/master"
                            
                                git log --follow, the gitpython way
                            
                                Git log: filter by commit's author date
                            
                                How is the Git hash calculated?
                            
                                Remove/hide git branches without deleting commit histories
                            
                                Git; code disappeared after merge
                            
                                Is it possible to do a Git Pull --rebase with Visual Studio git tools (and can it be set by default)
                            
                                Does git have anything like `svn propset svn:keywords` or pre-/post-commit hooks?
                            
                                Commit in git only if tests pass
                            
                                Continuous integration and continuous delivery with git-flow
                            
                                Automatic merge of pull requests on Github without the merge bubble

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dockerfile strategies for Git

Tags:

git

git-clone

docker

dockerfile

Hemerson Varela

People also ask

1 Answers

Hemerson Varela

Recent Activity

Donate For Us