I have a repo with thousands of remotes, and I'd like to pull from thousands of remotes at the same time, ideally I can specify a maximum number to do at the same time.
I wasn't able to find anything related to this in the manpages, google, or git-scm online.
To be perfectly clear: I do not want to run one command over multiple repos, I have one repo with thousands of remotes.
This has nothing to do with submodules, don't talk about submodules. Submodules are unrelated to git remotes.
Starting from Git 2.24 it it now possible with [--jobs]
option.
Some examples:
Fetching 3 remotes, 2 remotes will be fetched in parallel:
git fetch -j2 --multiple remote1 remote2 remote3
Fetching all remotes, 5 remotes will be fetched in parallel:
git fetch -jobs=5 --all
If you have thousands of remotes and you don't want to download all of them and they form some logical groups. Instead of specifying them in command line (with --multiple
) options You can also define remote groups like this in .git/config
[remotes]
group1 = remote1 remote2 origin
group2 = remote55 remote66
And then use this group in fetch command.
This command: git fetch --multiple -j4 group1 group2 remote10
fetches remote1 remote2 origin remote55 remote66 remote10
remotes and 4 fetches are done in parallel.
I'm pretty sure you have to write your own code to do this.
As CodeWizard says in a comment, Git needs to lock parts of the repository. Some of these locks are bound to collide at times, if you simply run multiple git fetch
processes in parallel within a single repository.
You might also want some kind of remote-ordering strategy since, e.g., collecting from remoteA
, remoteB
, and remoteC
in parallel may discover 10000 common objects on remoteB
as compared to the other two if remoteB
is generally (but not always) a superset of remoteA
and remoteC
.1 While this also applies to sequential git fetch
operations, it becomes considerably less important. Suppose, for example, that there are 5000 objects—some commits, some trees, and some blobs—on A that you do not yet have, 5000 others on C, and all 10000 on B. If you fetch sequentially, in any order, you pick up either 5k, then 5k, then 0; or 10k, then 0, then 0; because by the time you move to the next remote, you have collected and stored the 5k or 10k incoming objects. But if you do all three in parallel, you will bring 5k, 5k, and 10k objects in, and only then discover that you have doubled your workload.
1If B is always a superset, simply go to B first (sequentially), then go to A and C in parallel solely for their references, which will point to objects you now have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With