Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between "git submodule foreach git pull origin master" and "git pull origin master --recurse-submodules"

I have a dotfiles repository where all my vim plugins are stored as submodules so they are easy to update when they have changes. I thought these two commands did the same thing, but I noticed this must not be the case.

I knew I had updates to pull down in several submodules so I ran git pull origin master --recurse-submodules from the root of the parent repository. It appeared to iterate over each submodule, but only fetch updates from their origin repositories.

When I ran git submodule foreach git pull origin master then it actually ran git pull origin master within each repository, doing both the fetch and the merge.

What is the point of using --recurse-submodules? I'm a little confused about what it's actually trying to do and Google was a bit cryptic with what I found. I thought maybe you smart folks would have a simpler explanation.

like image 691
Chev Avatar asked Oct 29 '15 17:10

Chev


2 Answers

That option is mainly for fetching all the submodule commits, not just pulling one specific branch like master, for reasons detailed in the two following commits:
(note there is a bug fixed in Git 2.11, see at the end of this answer)

For git pull, this option has been introduced in (commit 7dce19d, Nov. 2010, git 1.7.4-rc0):

fetch/pull: Add the --recurse-submodules option

Until now you had to call "git submodule update" (without -N|--no-fetch option) or something like "git submodule foreach git fetch" to fetch new commits in populated submodules from their remote.

This could lead to "(commits not present)" messages in the output of "git diff --submodule" (which is used by "git gui" and "gitk") after fetching or pulling new commits in the superproject and is an obstacle for implementing recursive checkout of submodules.
Also "git submodule update" cannot fetch changes when disconnected, so it was very easy to forget to fetch the submodule changes before disconnecting only to discover later that they are needed.

This patch adds the "--recurse-submodules" option to recursively fetch each populated submodule from the url configured in the .git/config of the submodule at the end of each "git fetch" or during "git pull" in the superproject. The submodule paths are taken from the index.


Commit 88a2197 (March 2011, git 1.7.5-rc1) explains a bit more:

fetch/pull: recurse into submodules when necessary

To be able to access all commits of populated submodules referenced by the superproject, it is sufficient to only then let "git fetch" recurse into a submodule when the new commits fetched in the superproject record new commits for it.

  • Having these commits present is extremely useful when using the "--submodule" option to "git diff" (which is what "git gui" and "gitk" do since 1.6.6), as all submodule commits needed for creating a descriptive output can be accessed.
  • Also merging submodule commits (added in 1.7.3) depends on the submodule commits in question being present to work.
  • Last but not least this enables disconnected operation when using submodules, as all commits necessary for a successful "git submodule update -N" will have been fetched automatically.

So we choose this mode as the default for fetch and pull.


git pull origin master --recurse-submodules 
git submodule foreach git pull origin master

The first one should pull, not just fetch, and be equivalent to the second one. Maybe this is a parameter order issue:

git pull --recurse-submodules origin master 

However, it is not the recommended way to update submodule for a given branch: see the following section.


Note that the right way to actually pull from master would be to register the master branch to the submodule, making that submodule tracking master:

git config -f .gitmodules submodule.<path>.branch <branch>

Then a simple git submodule update --remote --recursive would be enough.
And the branch to fetch/pull is recorded in the parent repo (in the .gitmodules file), so you don't even have to remember which branch you want your submodule to update against.


Update Git 2.11 (Q4 2011)

Having a submodule whose ".git" repository is somehow corrupt caused a few commands that recurse into submodules loop forever.

See commit 10f5c52 (01 Sep 2016) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 293c232, 12 Sep 2016)

This last 2016 commits is expended with With Git 2.21 (Q4 2018): "git fetch --recurse-submodules"(man) may not fetch the necessary commit that is bound to the superproject, which is getting corrected.

See commit be76c21 (06 Dec 2018), and commit a62387b, commit 26f80cc, commit d5498e0, commit bcd7337, commit 16dd6fe, commit 08a297b, commit 25e3d28, commit 161b1cf (28 Nov 2018) by Stefan Beller (stefanbeller).
(Merged by Junio C Hamano -- gitster -- in commit 5d3635d, 29 Jan 2019)

submodule.c: fetch in submodules git directory instead of in worktree

Signed-off-by: Stefan Beller

Keep the properties introduced in 10f5c52656 ("submodule: avoid auto-discovery in prepare_submodule_repo_env()", 2016-09-01, Git v2.11.0-rc0 -- merge listed in batch #1), by fixating the git directory of the submodule.

But... "git fetch"(man) did not work correctly with nested submodules where the innermost submodule that is not of interest got updated in the upstream, which has been corrected with Git 2.30 (Q1 2021).

See commit 1b7ac4e (12 Nov 2020) by Peter Kaestle (dscho).
(Merged by Junio C Hamano -- gitster -- in commit d627bf6, 25 Nov 2020)

submodules: fix of regression on fetching of non-init subsub-repo

Signed-off-by: Peter Kaestle

A regression has been introduced by a62387b ("submodule.c: fetch in submodules git directory instead of in worktree", 2018-11-28, Git v2.21.0-rc0 -- merge listed in batch #4).

The scenario in which it triggers is when one has a remote repository with a subrepository inside a subrepository like this: superproject/middle_repo/inner_repo

Person A and B have both a clone of it, while Person B is not working with the inner_repo and thus does not have it initialized in his working copy.

Now person A introduces a change to the inner_repo and propagates it through the middle_repo and the superproject.

Once person A pushed the changes and person B wants to fetch them using "git fetch"(man) on superproject level, B's git(man) call will return with error saying:

Could not access submodule 'inner_repo' Errors during submodule fetch:> middle_repo

Expectation is that in this case the inner submodule will be recognized as uninitialized subrepository and skipped by the git fetch(man) command.

This used to work correctly before 'a62387b ("submodule.c: fetch in submodules git directory instead of in worktree", 2018-11-28, Git v2.21.0-rc0 -- merge listed in batch #4)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside .git/modules for a directory only existing in the worktree, delivering then of course wrong return value.

This patch reverts the changes of a62387b and introduces a regression test.


Warning: An earlier attempt to fix "git fetch --recurse-submodules"(man) broke another use case; revert it with Git 2.30 (Q1 2021), until a better fix is found.

See commit 7091499 (02 Dec 2020) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit f3e5dcd, 03 Dec 2020)

Revert "submodules: fix of regression on fetching of non-init subsub-repo"

This reverts commit 1b7ac4e6d4d490b224f5206af7418ed74e490608 Ralf Thielow reports that "git fetch"(man) with submodule.recurse set can result in a bogus and infinitely recursive fetching of the same submodule.


With Git 2.30.1 (Q1 2021), "git fetch --recurse-submodules"(man) fix (second attempt).

See commit 505a276 (09 Dec 2020) by Peter Kaestle (dscho).
(Merged by Junio C Hamano -- gitster -- in commit c977ff4, 06 Jan 2021)

submodules: fix of regression on fetching of non-init subsub-repo

Signed-off-by: Peter Kaestle
CC: Junio C Hamano
CC: Philippe Blain
CC: Ralf Thielow
CC: Eric Sunshine
Reviewed-by: Philippe Blain

A regression has been introduced by a62387b ("submodule.c: fetch in submodules git directory instead of in worktree", 2018-11-28, Git v2.21.0-rc0 -- merge listed in batch #4).

The scenario in which it triggers is when one has a repository with a submodule inside a submodule like this: superproject/middle_repo/inner_repo

Person A and B have both a clone of it, while Person B is not working with the inner_repo and thus does not have it initialized in his working copy.

Now person A introduces a change to the inner_repo and propagates it through the middle_repo and the superproject.

Once person A pushed the changes and person B wants to fetch them using "git fetch"(man) at the superproject level, B's git call will return with error saying:

Could not access submodule 'inner_repo' Errors during submodule fetch: middle_repo

Expectation is that in this case the inner submodule will be recognized as uninitialized submodule and skipped by the git fetch command.

This used to work correctly before 'a62387b ("submodule.c: fetch in submodules git directory instead of in worktree", 2018-11-28, Git v2.21.0-rc0 -- merge listed in batch #4)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside .git/modules for a directory only existing in the worktree, delivering then of course wrong return value.

This patch ensures is_empty_dir() is getting the correct path of the uninitialized submodule by concatenation of the actual worktree and the name of the uninitialized submodule.

The first attempt to fix this regression, in 1b7ac4e ("submodules: fix of regression on fetching of non-init subsub-repo", 2020-11-12, Git v2.30.0-rc0 -- merge listed in batch #8), by simply reverting a62387b, resulted in an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with uninitialized submodules, and so this commit was reverted in 7091499 (Revert "submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02, Git v2.30.0-rc0 -- merge listed in batch #10).
To prevent future breakages, also add a regression test for this scenario.


"git fetch --recurse-submodules from``"(man) multiple remotes (either from a remote group, or "--all") used to make one extra "git fetch"(man) in the submodules, which has been corrected with Git 2.37 (Q3 2022).

See commit 0353c68 (16 May 2022) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit fa61b77, 25 May 2022)

fetch: do not run a redundant fetch from submodule

Reviewed-by: Glen Choo

When 7dce19d ("fetch/pull: Add the --recurse-submodules option", 2010-11-12, Git v1.7.4-rc0 -- merge) introduced the "--recurse-submodule" option, the approach taken was to perform fetches in submodules only once, after all the main fetching (it may usually be a fetch from a single remote, but it could be fetching from a group of remotes using fetch_multiple()) succeeded.
Later we added "--all" to fetch from all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch"(man)--recurse-submodule --all, you'd see a fetch for the top-level, which invokes another fetch for the submodule, followed by another fetch for the same submodule.
All but the last fetch for the submodule come from a "git fetch --recurse-submodules"(man) subprocess that is spawned via the fetch_multiple() interface for the remotes, and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for the top-level in fetch_multiple(), the last fetch in the submodule is redundant.
It only matters when fetch_one() interacts with a single remote at the top-level.

While we are at it, there is one optimization that exists in dealing with a group of remote, but is missing when "--all" is used.
In the former, when the group turns out to be a group of one, instead of spawning "git fetch" as a subprocess via the fetch_multiple() interface, we use the normal fetch_one() code path.
Do the same when handing "--all", if it turns out that we have only one remote defined.

like image 74
VonC Avatar answered Nov 15 '22 17:11

VonC


What is the point of using --recurse-submodules?

--recurse-submodules will do submodules within a submodule (it's actually recursive). git submodule foreach git pull origin master will not, it will only do the immediate submodules.

like image 20
vcsjones Avatar answered Nov 15 '22 17:11

vcsjones