Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Highly coupled git submodules

I have a project which needs to be split into two repositories: a set of common models, and a simulation based on those models, with additional code. Ultimately there may be multiple simulations using the same set of models, so having them in a separate repository is a definite requirement. The obvious solution is to have the common models as a submodule of the simulation.

Unfortunately, the two repositories will be very highly coupled. People will be very frequently adding something to their common models then immediately using it in the simulation. I imagine this will make for a lot of headaches in the integration process of the simulation's repo. In order to merge changes from many developers in the simulation, the integrator will have to do parallel merges in the common models submodule. On the other hand, it also makes it essential to use submodules - the simulation really needs to know which version of the common models it should be using.

The project is worked on by a sizeable number of people. Most of the developers have only a very cursory knowledge of git: they add files, commit, and pull from origin a lot, and hopefully have a dev and stable branch. The integrator has naturally learned quite a bit more, but anything involving submodules will certainly be new to him. Added bonus: I'm about to take a month of vacation, so I won't be able to put out any fires. The upshot is that there's a lot of incentive to make the workflow really hard to screw up, and to minimize the difference from people's previous workflows.

So, my questions are: am I going to regret recommending we use submodules for this? (Is there a better idea?) What kind of mistakes can I expect people to make, so I can warn against them in advance? Are there any good workflow strategies to keep in mind?

Edit: I just came across git slave, which might be worth a look in this context too. Can't yet give a good evaluation of abilities/limitations beyond what's on its website.

like image 714
Cascabel Avatar asked Sep 14 '10 21:09

Cascabel


People also ask

Are git submodules a good idea?

Git submodules may look powerful or cool upfront, but for all the reasons above it is a bad idea to share code using submodules, especially when the code changes frequently. It will be much worse when you have more and more developers working on the same repos.

What is submodules Git?

A git submodule is a record within a host git repository that points to a specific commit in another external repository. Submodules are very static and only track specific commits. Submodules do not track git refs or branches and are not automatically updated when the host repository is updated.

What does git pull -- recurse submodules do?

If you pass --recurse-submodules to the git clone command, it will automatically initialize and update each submodule in the repository, including nested submodules if any of the submodules in the repository have submodules themselves.

Does git pull pull submodules?

Once you have set up the submodules you can update the repository with fetch/pull like you would normally do. To pull everything including the submodules, use the --recurse-submodules and the --remote parameter in the git pull command .


1 Answers

A few notes for anyone else happening across this!

The biggest mistake rookies are going to make is committing with detached HEAD in the submodule, after having done a submodule update. I'm going to try to counter this with strong warnings from hooks.

The next biggest will probably be failing to do a submodule update after doing a checkout which requires one. Again, hooks can check for this and warn.

As for development process, this setup makes it much more important to have a good test infrastructure in the submodule, so that if possible you can work in it without having to do work in the parent, and avoid the issue entirely.

I'll try and post sample code from the hooks I end up using, and follow up after a month with (hopefully not too many) true horror stories.

Edit:

Here are the first drafts of the hooks. Keep in mind this is a rush job and go easy on me!

In the parent repo:

For post-merge and post-checkout, we warn the user if the submodule's out of sync. (post-merge is included in particular for fast-forward merges, pulling from origin) Also note that they'll want to check out a branch, though the submodule's post-checkout hook will also do that when they run submodule update. The more reminders the merrier.

#!/bin/bash
if git submodule status | grep '^+' > /dev/null; then
    echo "WARNING: common model submodule now out of sync. You probably want to run" 1>&2
    echo "         git submodule update, then make sure to check out an appropriate branch" 1>&2
    echo "         in the submodule." 1>&2
fi

For post-commit, if there are submodule changes, we warn the user that they may have forgotten to include them in their commit. In this highly coupled case, this is a very good guess. It's unlikely the user will have modified the simulation and common models separately.

#!/bin/bash
if git submodule status | grep '^+' > /dev/null; then
    echo "WARNING: common model submodule has changes. If the commit you just made depends" 1>&2
    echo "         on those changes, you must run git add on the submodule, and then run" 1>&2
    echo "         git commit --amend to fix your commit." 1>&2
fi

And in the submodule, a post-checkout hook to strongly warn about detached HEAD:

#!/bin/bash

get_ppid() {
    ps --no-headers -o ppid $1
}

# Check to see if this checkout is part of a submodule update
# git submodule calls git checkout, which calls this script, so we need to
# check the grandparent process.
if ps --no-headers -o command $(get_ppid $(get_ppid $$)) | grep 'submodule update' &> /dev/null; then
    if ! git symbolic-ref HEAD &> /dev/null; then
        echo "WARNING: common model submodule entering detached HEAD state. If you don't know" 1>&2
        echo "         what this means, and you just ran 'git submodule update', you probably" 1>&2
        echo "         want to check out an appropriate branch in the submodule repository." 1>&2
        echo
        # escape the asterisk from SO's syntax highlighting (it sees C comments)
        branches=($(git for-each-ref --format='%(objectname) %(refname:short)' refs/heads/\* | grep ^$(git rev-parse HEAD) | cut -d\  -f2))
        case ${#branches} in
            0 )
                ;;
            1 ) 
                echo "Branch '${branches[0]}' is at HEAD"
                ;;
            * )
                echo "The following branches are at HEAD: ${branches[@]}"
                ;;
        esac
    fi
    echo
fi

I'm also adding a pre-commit hook to simply abort commits made with detached HEAD (unless it's a rebase). I'm pretty terrified of getting the classic "all of my commits disappeared" panicked complaint. You can always bypass it with --no-verify if you know what you're doing.

like image 160
Cascabel Avatar answered Oct 06 '22 02:10

Cascabel