Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to write a git update hook that rejects invalid submodule commits?

I am attempting to write an update hook for git that bounces if a submodule is being updated to a commit ID that does not exist in the submodule's upstream repository. To say it another way, I want to force users to push changes to the submodule repositories before they push changes to the submodule pointers.

One caveat:

  • I only want to test submodules whose bare, upstream repositories exist on the same server as the parent repository. Otherwise we start having to do crazy things like call 'git clone' or 'git fetch' from within a git hook, which would not be fun.

I have been playing around with an idea but it feels like there must be a better way to do this. Here is what I was planning on doing in the update hook:

  1. Check the refname passed into the hook to see if we are updating something under refs/heads/. If not, exit early.
  2. Use git rev-list to get a list of revisions being pushed.
  3. For each revision:
    1. Call git show <revision_id> and use a regular expression that looks to see if a submodule was updated (by searching for `+Subproject commit [0-9a-f]+).
    2. If this commit did change a submodule, get the contents of the .gitmodules files as seen by that particular commit (git show <revision_id>:.gitmodules).
    3. Use the results of 3.1 and 3.2 to get a list of submodule URLs and their updated commit IDs.
    4. Check this list created in 3.3 against an external file that maps submodule URLs to local bare git repositories on the filesystem.
    5. cd to the paths found in 3.4 and execute git rev-parse --quiet --verify <updated_submodule_commit_id> to see if that commit exists in that repository. If it does not, exit with a non-zero status.

(Note: I believe the results of 3.2 can potentially be cached across revisions as long as the output to git rev-parse --quiet --verify <revision_id>:.gitmodules doesn't change from one revision to the next. I left this part out to simplify the solution.)

So yeah, this seems pretty complex, and I can't help but wonder if there are some internal git commands that might make my life a lot easier. Or maybe there is a different way to think about the problem?

like image 246
Sebastian Celis Avatar asked Jan 21 '11 20:01

Sebastian Celis


People also ask

What does git submodule update -- init do?

The git submodule init command creates the local configuration file for the submodules, if this configuration does not exist. If you track branches in your submodules, you can update them via the --remote parameter of the git submodule update command.

What is git submodule update -- Remote?

A git submodule update --init --remote is like: git submodule init : to initialize (checkout) the submodules recorded in the index. git submodule update --remote : to pull from the registered branch (or master by default), once the submodule has been initialized (checked out).


1 Answers

Edit, much later: As of Git 1.7.7, git-push now has a --recurse-submodules=check option, which refuses to push the parent project if any submodule commits haven't been pushed to their remotes. It doesn't appear that a corresponding push.recurseSubmodules config parameter has been added yet. This of course doesn't entirely address the problem - a clueless user could still push without the check - but it's quite relevant!

I think the best approach, rather than examining each individual commit, is to look at the diff across all of the pushed commits: git diff <old> <new>. You don't want to look at the whole diff though, really; it could be enormous. Unfortunately, the git-submodule porcelain command doesn't work in bare repos, but you should still be able to quickly examine .gitmodules to get a list of paths (and maybe URLs). For each one, you can git diff <old> <new> -- path, and if there is a diff, grab the new submodule commit. (And if you're worried about a 000000 old commit possibility, you can just use git show on the new one, I believe.)

Once you get all that taken care of, you've reduced the problem to checking whether given commits exist in given remote repositories. Unfortunately, as it looks like you've noticed, that's not straightforward, at least as far as I know. Keeping local, up-to-date clones is probably your best bet, and it sounds like you're good there.

By the way, I don't think the caching is going to be relevant here, since the update hook is once per ref. Yes, you could do this in a pre-receive hook, which gets all the refs on stdin, but I don't see why you should bother doing more work. It's not going to be an expensive operation, and with an update hook, you can individually accept or reject the various branches being pushed, instead of preventing all of them from being updated because only one was bad.

If you want to save some trouble, I'd probably just avoid parsing the gitmodules file, and hardcode a list into the hook. I doubt your list of submodules changes very often, so it's probably cheaper to maintain that than to write something automated.

like image 176
Cascabel Avatar answered Sep 22 '22 12:09

Cascabel