Is there any way to execute a git submodule foreach
command in parallel, similarly of how the --jobs 8
parameter works with git submodule update
?
For example, one of the projects we work on involves almost 200 sub-components (submodules) and we heavily use the foreach
command to operate on them. I'd like to speed them up.
PS: In the case the solution involves a script, I work on Windows and, most of the time, using git-bash.
I propose you a solution based on a interpreted language multiplatform like Python.
First of all you need define a class to manage the process to launch the command.
class PFSProcess(object):
def __init__(self, submodule, path, cmd):
self.__submodule = submodule
self.__path = path
self.__cmd = cmd
self.__output = None
self.__p = None
def run(self):
self.__output = "\n\n" + self.__submodule + "\n"
self.__p = subprocess.Popen(self.__cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True,
cwd=os.path.join(self.__path, self.__submodule))
self.__output += self.__p.communicate()[0].decode('utf-8')
if self.__p.communicate()[1]:
self.__output += self.__p.communicate()[1].decode('utf-8')
print(self.__output)
Next step is a generate multithread execution. Python includes in its core very powerful library to work with Threads. You can use it importing the following package:
import threading
Before threads creation you need create a worker, a function to call for each thread:
def worker(submodule_list, path, command):
for submodule in submodule_list:
PFSProcess(submodule, path, command).run()
As you can see the worker recives a submodule list. For clarity and because it is out of our scope, I recommend you take a look to .gitmodules
from where you can generate the list of your submodules reading the file.
As basic orientation you can find the following line in each submodule:
path = relative_path/project
For that purpose you can use this regular expression:
'path ?= ?([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'
If the regular expression matches you can get the relative path using the following one in the same line:
' ([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'
Pay attention because the last regular expression returns the relative path with a space character at first position.
Then split the submodule list into as many chunks as jobs that you want:
num_jobs = 8
i = 0
for submodule in submodules:
submodule_list[i % num_jobs].append(submodule)
i += 1
Finally dispatch each chunk (job) to each thread and wait until all threads finish:
for i in range(num_jobs):
t = threading.Thread(target=worker, args=(list_submodule_list[i], self.args.path, self.args.command,))
self.__threads.append(t)
t.start()
for i in range(num_jobs):
self.__threads[i].join()
Obviously I have exposed the basic concepts, but you can access to full implementation accessing to parallel_foreach_submodule (PFS) project in GitHub.
A simple, bash only solution is to do this (replace <command with your command>
):
IFS=$'\n'
for DIR in $(git submodule foreach -q sh -c pwd); do
cd $DIR && <command> &
done
wait
As a generic command (create a file called "git-foreach-parallel"):
#!/bin/bash
if [ -z "$1" ]; then
echo "Missing Command" >&2
exit 1
fi
COMMAND="$@"
IFS=$'\n'
for DIR in $(git submodule foreach -q sh -c pwd); do
cd "$DIR" && $COMMAND &
done
wait
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With