When I run git submodule update --init
first time on a projects which have a lot of submodules, this usually take a lot of time, because most of submodules are stored on slow public servers.
Is there a possibility to initialize submodules asynchronously?
The git submodule init command creates the local configuration file for the submodules, if this configuration does not exist. If you track branches in your submodules, you can update them via the --remote parameter of the git submodule update command.
Submodules are very static and only track specific commits. Submodules do not track git refs or branches and are not automatically updated when the host repository is updated. When adding a submodule to a repository a new . gitmodules file will be created.
If you already cloned the project and forgot --recurse-submodules , you can combine the git submodule init and git submodule update steps by running git submodule update --init . To also initialize, fetch and checkout any nested submodules, you can use the foolproof git submodule update --init --recursive .
If you pass --recurse-submodules to the git clone command, it will automatically initialize and update each submodule in the repository, including nested submodules if any of the submodules in the repository have submodules themselves.
This can also be done in Python. In Python 3 (because we're in 2015...), we can use something like this:
#!/usr/bin/env python3
import os
import re
import subprocess
import sys
from functools import partial
from multiprocessing import Pool
def list_submodules(path):
gitmodules = open(os.path.join(path, ".gitmodules"), 'r')
matches = re.findall("path = ([\w\-_\/]+)", gitmodules.read())
gitmodules.close()
return matches
def update_submodule(name, path):
cmd = ["git", "-C", path, "submodule", "update", "--init", name]
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
if len(sys.argv) != 2:
sys.exit(2)
root_path = sys.argv[1]
p = Pool()
p.map(partial(update_submodule, path=root_path), list_submodules(root_path))
This may be safer than the one-liner given by @Karmazzin (since that one just keeps spawning processes without any control on the number of processes spawned), still it follows the same logic: read .gitmodules
, then spawn multiple processes running the proper git command, but here using a process pool (the maximum number of processes can be set too). The path to the cloned repository needs to be passed as an argument. This was tested extensively on a repository with around 700 submodules.
Note that in the case of a submodule initialization, each process will try to write to .git/config
, and locking issues may happen:
error: could not lock config file .git/config: File exists
Failed to register url for submodule path '...'
This can be caught with subprocess.check_output
and a try/except subprocess.CalledProcessError
block, which is cleaner than the sleep added to @Karmazzin's method. An updated method could look like:
def update_submodule(name, path):
cmd = ["git", "-C", path, "submodule", "update", "--init", name]
while True:
try:
subprocess.check_output(cmd, stderr=subprocess.PIPE, shell=False)
return
except subprocess.CalledProcessError as e:
if b"could not lock config file .git/config: File exists" in e.stderr:
continue
else:
raise e
With this, I managed to run the init/update of 700 submodules during a Travis build without the need to limit the size of the process pool. I often see a few locks caught that way (~3 max).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With