Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Execute "git submodule foreach" in parallel

Is there any way to execute a git submodule foreach command in parallel, similarly of how the --jobs 8 parameter works with git submodule update?

For example, one of the projects we work on involves almost 200 sub-components (submodules) and we heavily use the foreach command to operate on them. I'd like to speed them up.

PS: In the case the solution involves a script, I work on Windows and, most of the time, using git-bash.

like image 371
cbuchart Avatar asked Apr 24 '18 10:04

cbuchart


2 Answers

I propose you a solution based on a interpreted language multiplatform like Python.


Process Launcher


First of all you need define a class to manage the process to launch the command.

class PFSProcess(object):
    def __init__(self, submodule, path, cmd):
        self.__submodule = submodule
        self.__path = path
        self.__cmd = cmd
        self.__output = None
        self.__p = None

    def run(self):
        self.__output = "\n\n" + self.__submodule + "\n"
        self.__p = subprocess.Popen(self.__cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True,
                             cwd=os.path.join(self.__path, self.__submodule))
        self.__output += self.__p.communicate()[0].decode('utf-8')
        if self.__p.communicate()[1]:
            self.__output += self.__p.communicate()[1].decode('utf-8')
        print(self.__output)


Multithreading


Next step is a generate multithread execution. Python includes in its core very powerful library to work with Threads. You can use it importing the following package:

import threading

Before threads creation you need create a worker, a function to call for each thread:

def worker(submodule_list, path, command):
    for submodule in submodule_list:
        PFSProcess(submodule, path, command).run()

As you can see the worker recives a submodule list. For clarity and because it is out of our scope, I recommend you take a look to .gitmodules from where you can generate the list of your submodules reading the file.


💡 < Tip >

As basic orientation you can find the following line in each submodule:

path = relative_path/project

For that purpose you can use this regular expression:

'path ?= ?([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'

If the regular expression matches you can get the relative path using the following one in the same line:

' ([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'

Pay attention because the last regular expression returns the relative path with a space character at first position.

💡 < / Tip>


Then split the submodule list into as many chunks as jobs that you want:

num_jobs = 8

i = 0
for submodule in submodules:
    submodule_list[i % num_jobs].append(submodule)
    i += 1

Finally dispatch each chunk (job) to each thread and wait until all threads finish:

for i in range(num_jobs):
    t = threading.Thread(target=worker, args=(list_submodule_list[i], self.args.path, self.args.command,))
    self.__threads.append(t)
    t.start()

for i in range(num_jobs):
    self.__threads[i].join()


Obviously I have exposed the basic concepts, but you can access to full implementation accessing to parallel_foreach_submodule (PFS) project in GitHub.

like image 177
RDCH106 Avatar answered Oct 02 '22 01:10

RDCH106


A simple, bash only solution is to do this (replace <command with your command>):

IFS=$'\n'
for DIR in $(git submodule foreach -q sh -c pwd); do
    cd $DIR && <command> &
done
wait

As a generic command (create a file called "git-foreach-parallel"):

#!/bin/bash

if [ -z "$1" ]; then
    echo "Missing Command" >&2
    exit 1
fi

COMMAND="$@"

IFS=$'\n'
for DIR in $(git submodule foreach -q sh -c pwd); do
    cd "$DIR" && $COMMAND &
done
wait
like image 39
NormalGaussian Avatar answered Oct 02 '22 03:10

NormalGaussian