Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for multiple git repositories

I have around 20 different repositories. Many are independent and compile as libraries but some others have dependencies among them. Dependency resolution and branching is complicated.

Suppose that I have a super project that only aggregates all other repositories. It is used exclusively to run tests -- no real development goes here.

/superproject  [master, HEAD]
    /a         [master, HEAD]
    /b         [master, HEAD]
    /c         [master, HEAD]
    /...

Now, to develop specific features or fixes for each one (a), especially one of those that require specific versions of projects to compile or run (b v2.0 and c 3.0) I have to create a new branch:

/superproject  [branch-a, HEAD]  <-- branch for 'a' project
    /a         [master]  <-- new commits here
    /b         [v2.0]
    /c         [v3.0]

For b, it might be required something else, like a v0.9 and c v3.1:

/superproject  [branch-b, HEAD]  <-- branch for 'b' project
    /a         [v0.9]   <-- older version than 'a'
    /b         [master] <-- new commits go here
    /c         [v3.1]   <-- newer version than 'a'

This becomes even more complex and complicated when implementing common git workflows involving feature branches, hotfix branches, release branches, etc. I was advised to (and advised against) using git-submodules, git-subtree, google's git-repo, git-slave, etc.

How can I manage continuous integration for such a complex project?

EDIT

The real question is how to run tests without having to mock all other dependent projects? Especially when all projects might use different versions. Trigger Jenkins tests after commits in git submodules

like image 948
betodelrio Avatar asked Jul 03 '15 17:07

betodelrio


People also ask

When should I use multiple repositories?

Having multiple repositories makes it easy to give access to subsets of repositories on a “need to code” basis. I set up Continuous Deployment for my projects. It's much easier to let each repository have it's own process for being deployed.

Can I have multiple Git repositories?

With Git, using multiple repositories is the only way to work efficiently. This enables each team to work independently, and do their work faster. You can also make sure that developers only have access to the repositories they need access to (thus making Git more secure.)

How many Git repositories can I have?

Theoretically, everyone can has an unlimited number of public and private repositories even as part of a free plan. Public repositories don't have officially any restrictions even as part of a free plan. Save this answer.


2 Answers

For working with multiple branches in parallel, use paralleled clones if possible. cd is an awful lot easier than checkout and clean and check-for-stale-detritus and recreate-caches every time you want to switch.


So far as recording your test environments goes, what you're describing is exactly what submodules do, in every detail. For something this simple, I'm going to recommend setting yourself up without using the submodule command at all, and telling it about your setup once you're comfortable and the top item on your submodule-issues list is keystroke count.

Starting from the setup in your question, here's how you set yourself up to record clean builds in the subprojects:

cd $superproject
git init .
git add a b c etc
git commit -m "recording test state for $thistest"

That's it. You've committed a list of commit id's, i.e. the id's of the currently-checked-out commits in each of those repos. The actual content is in those repos, not this one, but that's the entire difference between files and submodules so far as git's concerned. The .gitmodules file has random notes to help cloners, mainly a suggested repo that's supposed to contain the necessary commits, and random notes for command defaults, but what it's doing is easy and obvious.

Want to check out the right commit at path foo?

(commit=`git rev-parse :foo`; cd foo; git checkout $commit)

The rev-parse fetches the content id for foo from the index, the cd and checkout do that.

Here's how you find all your submodules and what should be checked out there to recreate the staged aka indexed environment:

git ls-files -s | grep ^16

Check what your current index lists for a submodule and what's actually checked out there:

echo $(git rev-parse :$submodule; (cd $submodule; git rev-parse HEAD))

and there you go. Check out the right commits in all your submodules?

git ls-files -s | grep ^16 | while read mode commit stage path; do
        (cd "$path"; git checkout $commit)
done

Sometimes you're carrying local patches you want applied to every checkout:

git ls-files -s | grep ^16 | while read mode commit stage path; do
        (cd $path; git rebase $commit)
done

and so forth. There's git submodule commands for these, but they're not doing anything you don't see above. Same for all the rest, you can translate everything they do into near-oneliners like the ones above.

There's nothing mysterious about submodules.


Continuous integration is generally done with any of a whole lot of tools, I'll leave that for someone else to address.

like image 155
jthill Avatar answered Oct 20 '22 12:10

jthill


As the author, git slave could work in this situation. How to use it would depend on whether you had control over repos a b and c; by which I mean you could cause the branch strategy to be synchronized between them so that the v2 branch meant the same thing for everyone. If this is true, I would strongly urge git slave since you can essentially treat it as one large project.

If you could not mandate a common branch and tag strategy then you would impose one, which is getting more towards a lightweight version of the workflow that jthill suggested with git submodules. Specifically, you could have your own repo tracking a b and c and create a branch a branch in each one, which would correspond to whatever the correct branches for each slave repo is. Like git submodules you would have to manually bring each repo up to date (merge in this case). However, you would not need to do the mother-may-I step of making the commit in the superproject. Using this technique is not the slam-dunk use case of having the slave projects share the same branch name when they do their own development, but it will work.

As jthill said, continuous integration is pretty much orthoginal to the question of how to wrangle the projects.

like image 41
Seth Robertson Avatar answered Oct 20 '22 12:10

Seth Robertson