Is there a way I can only pull the latest commit in a git submodule? I was trying to put boost as a git submodule in some projects but since the boost repo with everything included is really heavyweight I wanted to only update the submodules to the latest commit and not pull all commits. Is this possible? For example, when I do <pre class="prettyprint"><code>git submodule update --init --recursive </code></pre> All the boost submodules get pulled with all their commits. Can I only ask the submodules to mirror the latest commit instead of pulling all changes? Note Shallow clones with the <code>--depth</code> flag do not work because that only pulls the latest commit, and the latest commit has only the changes made in that commit, so the repository is not in the right state. Note <code>git archive</code> (as suggested in an answer below) does not seem to work when I try the following sequence of commands <pre class="prettyprint"><code>mkdir temp-git-test cd temp-git-test git init git submodule add --depth 1 https://github.com/boostorg/boost cd boost git archive --format=tar HEAD --output ../boost.tar.gz cd .. tar -xzvf boost.tar.gz </code></pre> The output of the unzipped repo is the same as the submodule. Am I doing something wrong?

<blockquote> Note Shallow clones with the --depth flag do not work because that only pulls the latest commit, and the latest commit has only the changes made in that commit, so the repository is not in the right state. </blockquote> Then combine a <code>git archive</code> of the <code>boost</code> repo with a shallow clone setting for your submodule: <ul> <li>your submodule is still shallow</li> <li>but then you override its incomplete content with the one (complete) of a <code>git archive</code> image of the same repo, making the working tree an exact replica of the remote repo SHA1.</li> </ul> From there, each refresh (shallow) will complement a content which was complete, and will remain up-to-date. <code>git archive</code> is done in a local clone of the repo: <pre class="prettyprint"><code>git archive --format=tar HEAD </code></pre> If you don't have a local clone, but the boost repo is on GitHub (like, for instance, <code>boostorg/boost</code>), then you can get a compressed image of the current HEAD with a simple curl (no need for <code>git archive</code> then). <hr> As seen in the comment, adding the content of an archive is of no use, as it represents the same content of the commit. However, this seems incomplete: <pre class="prettyprint"><code>git submodule add --depth 1 https://github.com/boostorg/boost </code></pre> For a submodule update --remote to work (ie to fetch the last commit, instead of keeping the initial SHA1 checkout), you would need: <pre class="prettyprint"><code>git submodule add -b master --depth 1 https://github.com/boostorg/boost </code></pre> Then a <code>git submodule update --init --recursive --remote</code> would fetch the last commit. See "Git submodules: Specify a branch/tag".

How to only pull latest commit in a git submodule

Tags:

git

Is there a way I can only pull the latest commit in a git submodule? I was trying to put boost as a git submodule in some projects but since the boost repo with everything included is really heavyweight I wanted to only update the submodules to the latest commit and not pull all commits. Is this possible?

For example, when I do

git submodule update --init --recursive

All the boost submodules get pulled with all their commits. Can I only ask the submodules to mirror the latest commit instead of pulling all changes?

Note Shallow clones with the --depth flag do not work because that only pulls the latest commit, and the latest commit has only the changes made in that commit, so the repository is not in the right state.

Note git archive (as suggested in an answer below) does not seem to work when I try the following sequence of commands

mkdir temp-git-test
cd temp-git-test
git init
git submodule add --depth 1 https://github.com/boostorg/boost
cd boost
git archive --format=tar HEAD --output ../boost.tar.gz
cd ..
tar -xzvf boost.tar.gz

The output of the unzipped repo is the same as the submodule. Am I doing something wrong?

206

asked Nov 27 '16 21:11

Curious

2 Answers

The short answer is no. The long answer is maybe, but consider another way.

Shallow clones and shallow submodules

The long answer, which lets you get partway to what you want, starts with a technical note: you're not pulling, in Git terms. In Git, "pull" means "fetch, then merge-or-rebase" and you are not going to merge-or-rebase here. In fact, when you're "init"-ing you are generally going to make the initial clones.

Each submodule is actually its own repository.¹ Git is, sooner or later, going to do a git checkout within each of those repositories, asking it to check out, not a branch, but rather one specific commit, which is quite often not the latest commit. Given the nature of Git repositories and software development, and the idea that a submodule is, in the first place, a reference to a third-party repository, i.e., one you specifically do not and cannot control, the best you can do is say: "I know that my software works with one specific version of their software, and that version is <fill in the blank>." Thus, your repository lists the specific version you want from their repository.

Now we get to the heart of the problem. When you git clone a repository, or use git fetch to update an existing clone, you do so by asking for specific branch and/or tag names, rather than specific commit IDs. There is some (very limited) support for fetching specific IDs, but it must be enabled in that other repository, the one we just said that you do not and cannot control. Enabling fetch-by-ID is computationally expensive for them—whoever "they" are, the ones controlling the other repository—and not something you can do on your side, nor demand, nor is it enabled by default. This means that in general it's just not available.

In any case, git clone only works with names: you may git clone -b branch url, for instance, to make your new clone start by checking out that specific branch, or git clone -b tag url to make your new clone start by checking out (as a detached HEAD) that specific tag. Despite this "check out a specific branch or tag", though, the clone defaults to cloning all the names offered by the remote, and making a full-depth (i.e., non-shallow) clone.

All of this does mean something important. First, shallow clones exist. A shallow clone is one made with a --depth argument. It can be deepened by a git fetch with another --depth. The "depth" is the number of commits fetched "beyond" the commit(s) identified by the name(s) used during the clone or fetch, with some fairly complicated rules. (The details of these rules don't matter much here.)

Second, because shallow clones exist, shallow submodules also exist. A shallow submodule is simply a submodule that is cloned with --depth. But there is a problem: there is no easy or obvious way to determine what depth is needed. You can pass a --depth argument to git submodule add or git submodule update, but it's not obvious how deep you should go.

Here's the problem: your submodule will be cloned, perhaps by a branch or tag name, but then your submodule will be told to check out one particular commit (by its raw hash ID). Will that commit be in the clone? What depth guarantees that it will? If you are cloning by tag name, and the tag always names the correct commit, you can use --depth 1 (and hence you can use --shallow-submodules during the initial git clone as well), but that only works if, well, see above.

¹What's special about these sub-repositories is that they are:

listed in the outer repository (in a .gitmodules file);
generally kept in "detached HEAD" mode;
and detached at a commit whose ID is stored in the outer repository.

The modules file lists the names and URLs for the various submodules. "Initializing" a submodule amounts to copying stuff from .gitmodules to the configuration file for the containing superproject, and "updating" a submodule usually amounts to cloning or fetching. The commit at which the submodule is to be detached is recorded in the superproject's repository as a "gitlink" entry in a tree object.

Submodule support has grown rather complex in modern versions of Git though, so now there are more things you can do when doing the update step.

Reference clones

There is a much better, more general solution for many cases. Instead of fussing with shallow clones, you can point Git at a reference clone. The reference clone is any clone of the repository you're trying to clone.² Ideally, it's a recent and reasonably up-to-date clone of the repository you are cloning, but any clone will do.

What Git does with a reference clone is a bit complicated (see the documentation for details), but the short version is that when cloning some repository, instead of getting all the objects over the network from some distant server (which may be slow and/or rate-limited), your Git will ask the distant server what objects and such it needs, then look at your local³ reference clone to see if it already has those objects. If so, it will "borrow" them from the reference clone.

This lets you obtain a full, complete, up-to-date clone while using very little network and storage resources, since you will no longer need to bring (most or all of) the data over, nor (unless --detach-ing) store it yourself. That in turn means you need not worry about your shallow clone being too shallow: you just get one slow full clone, then reference the heck out of it for all other clones, which go fast. Using reference clones can cut the time to clone a few big GitHub repositories, from an hour-plus, down to tens of seconds, for instance.

²Technically, the reference could be any repository at all. A repository not actually related to the one you are cloning is going to make a lousy reference, though: it will have none of the objects you need, and will provide no speedup at all. (It could even have the wrong data under the object's name, although the chances of this are vanishingly small. This cannot happen if the reference is correct since object names cannot be reused this way.)

³The reference should be "as local as possible" for speed, but does not really have to be on your machine, just accessible. If the reference will not always be present you will probably want to add --dissociate, so that the objects get copied from the reference clone into the new clone. This uses more disk space, of course.

answered Sep 22 '22 08:09

torek

Note Shallow clones with the --depth flag do not work because that only pulls the latest commit, and the latest commit has only the changes made in that commit, so the repository is not in the right state.

Then combine a git archive of the boost repo with a shallow clone setting for your submodule:

your submodule is still shallow
but then you override its incomplete content with the one (complete) of a git archive image of the same repo, making the working tree an exact replica of the remote repo SHA1.

From there, each refresh (shallow) will complement a content which was complete, and will remain up-to-date.

git archive is done in a local clone of the repo:

git archive --format=tar HEAD

If you don't have a local clone, but the boost repo is on GitHub (like, for instance, boostorg/boost), then you can get a compressed image of the current HEAD with a simple curl (no need for git archive then).

As seen in the comment, adding the content of an archive is of no use, as it represents the same content of the commit.

However, this seems incomplete:

git submodule add --depth 1 https://github.com/boostorg/boost

For a submodule update --remote to work (ie to fetch the last commit, instead of keeping the initial SHA1 checkout), you would need:

git submodule add -b master --depth 1 https://github.com/boostorg/boost

Then a git submodule update --init --recursive --remote would fetch the last commit.

See "Git submodules: Specify a branch/tag".

answered Sep 21 '22 08:09

VonC

Related questions
                            
                                Git includeIf for personal and work profiles doesn't work
                            
                                Why are git submodules incompatible with svn externals?
                            
                                Git rebasing to upstream
                            
                                How do I have to configure gitweb and gitolite so they'll work together?
                            
                                How to remove all files in a Git repository that are not in the working directory?
                            
                                git: can't find blob - want to get rid of it from pack
                            
                                make git log --graph --all show current location like hg glog does
                            
                                git stash pop vs git rebase
                            
                                Is there a good php git client with http support? [closed]
                            
                                How to permanently prevent specific part of a file from being committed in git?
                            
                                Best practice for tracking upstream in fork on github
                            
                                Microsoft Visual Studio Tools for git push error
                            
                                Two repositories (1 svn and 1 git) on same folder?
                            
                                Could not resolve hostname git: nodename nor servname provided, or not known
                            
                                Rails local asset:precompile - is there an automated way to check for changes?
                            
                                Squash two Git commits in the middle of history without interactive rebase
                            
                                How to use PyCharm as a GIT diff tool from the command line?
                            
                                How to undo merge of master branch? [duplicate]
                            
                                What pattern does .gitignore follow?
                            
                                How should you create a patch for an older tag in source control?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With