Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git subrepositories

I am a member of a small firmware team and we use a private Git server for version control. Our codebase typically has folders for platform specific code, common code used by multiple platforms and an SDK provided by the manufacturer of the microprocessor we develop on.

Currently our repositories are centered around common code; each platform has a folder in the repository. This has at least two serious consequences;

  1. When we push changes for one platform it's visible for all other platforms which share the same common code. This clutters the commit history with releases for multiple platforms and in the past has led to confusion about which platform we're reviewing changes for.

  2. Changes to common code need to be verified for all platforms in the repository before they can be committed, causing unnecessary pain. (Somebody didn’t do this yesterday and made changes that caused a couple of our older platforms to fail to build. This particular repository has 5 platforms in it)

I'm pushing to have a repository for each platform so we can make changes to common code and upgrade the SDK periodically without being forced to update and verify all of the platforms we're supporting. Since we're a five person team and these problems typically rear up during the crunch-time leading up to a release this would save us a ton of unnecessary brain damage and time. Naturally the hard part is in the implementation.

Right now I'm imagining a system where each platform has its own repository, each common code base has its own repository and each of the SDKs we use have their own repository. The platform repositories would link to or otherwise reference the common code and SDK they use as "sub-repositories."

I've read about submodules but I think they would cause more issues than they would fix; recently we had to debug three year old firmware, which required resetting our local codebase to the problem release to do static analysis. As I understand the local version of the submodule is detached from the repository in which it resides, which means resetting the repository to an old commit will leave the submodule in the "future" with respect to the rest of the repository. In the context of debugging old code this behavior is 100% unacceptable.

Here's a bulleted list of what I want to achieve; (in order of importance)

  1. Break platform specific code away from common code so changes to common code and/or the SDK don't have unforeseen consequences on other platforms.

  2. Create new repositories for each platform, each common code base and each SDK.

  3. Cloning a remote repository to a local machine should be a one step process; new users should not be forced to pull each sub-repository before the platform will build.

    • In addition; restoring old code, including the common code and SDK that was used to build version XXX-YYY of any given platform, must be possible.

    • Edit: The last time I had to "restore" code I used a hard reset to the build I was debugging and a soft reset to a build a year older than it that I knew was good, but only so the Git plugin in my syntax highlighter would display changes between the known-good and known-bad for me.

      Incidentally this particular platform is monolithic and this question does not apply to it, but for the sake of argument we'll say it does apply.

  4. Modifying the common code should not effect the platforms until the maintainer of the platform pulls in the new common code. (I believe this is possible with subtrees by adding one of the common code's tagged commits instead of the common code's master)

  5. A read-only sub-repository is an acceptable limitation. (i.e. the sub-repository either cannot be modified in the platform's repository or changes to the sub-repository cannot be pushed from the platform's repository)

  6. SmartGit/HG support is desirable as all but one of our members uses it.

  7. Scripting one-time tasks is acceptable, but I don't want to pollute platforms with scripts that do Git's job for it.

I've also read about subtrees but at this moment I'm unsure of whether they would intuitively permit the behavior I desire. My main question is this: does Git support this kind of functionality? If yes, is that functionality implemented by subtrees or by another method I am not yet aware of?

like image 988
JacaByte Avatar asked Feb 12 '17 00:02

JacaByte


People also ask

Are git submodules worth it?

Git submodules may look powerful or cool upfront, but for all the reasons above it is a bad idea to share code using submodules, especially when the code changes frequently. It will be much worse when you have more and more developers working on the same repos.

What is Subrepository?

Subrepositories is a feature that allows you to treat a collection of repositories as a group. This will allow you to clone, commit to, push, and pull projects and their associated libraries as a group.

When should I use git submodules?

In most cases, Git submodules are used when your project becomes more complex, and while your project depends on the main Git repository, you might want to keep their change history separate. Using the above as an example, the Room repository depends on the House repository, but they operate separately.

What are submodules in git?

A git submodule is a record within a host git repository that points to a specific commit in another external repository. Submodules are very static and only track specific commits. Submodules do not track git refs or branches and are not automatically updated when the host repository is updated.


1 Answers

I'm only chiming in here because I think @VonC's answer doesn't fully explain why submodules do address @JacaByte's concerns.

I think they would cause more issues than they would fix; recently we had to debug three year old firmware, which required resetting our local codebase to the problem release to do static analysis. As I understand the local version of the submodule is detached from the repository in which it resides, which means resetting the repository to an old commit will leave the submodule in the "future" with respect to the rest of the repository.

This is not entirely incorrect. However, it is incorrect to say that submodules do not let you check out an old version of your codebase. This confusion arises from the fact that git checkout does not update submodule working trees, but rather leaves that job to you (git submodule update --init --recursive will do this for you).

Submodules are fantastic tools for controlling repository dependencies. In fact, they do specifically what you want them to: associate a specific version of your code with a specific version of the dependency. Specifically, a "submodule" is really just a text file containing the SHA-1 hash corresponding to a commit on the submodule repo (Git stores metainformation about the submodule remote in .gitmodules, which is how it knows where to get a copy of the submodule repo from).

Working with submodules, however, can be thorny. It requires you to both have a good understanding of the Git model itself (commits & working tree/staging/repo) and an understanding of the submodule system itself. But if you make sure people know what they're doing (and this is a very big if), you can avoid shooting yourself in your foot.


You seem to be very specifically concerned about your ability to rollback to an old version of your codebase, and @VonC's answers do not seem to have allayed your concerns, so I'll provide a walkthrough here of how to do that:

I'll use my personal .dotfiles repo as an example (I pull in Vim extensions as submodules; at the time of this writing, HEAD is d9c0a797ad45a0d2fd92a07d3c3802528ed7b82a):

$ git clone https://github.com/sxlijin/.dotfiles dotfiles
Cloning into 'dotfiles'...
remote: Counting objects: 350, done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 350 (delta 12), reused 0 (delta 0), pack-reused 318
Receiving objects: 100% (350/350), 86.61 KiB | 0 bytes/s, done.
Resolving deltas: 100% (176/176), done.
$ cd dotfiles/
$ git submodule update --init --recursive
Submodule 'bundle/jedi-vim' (http://github.com/davidhalter/jedi-vim) registered for path 'vim/bundle/jedi-vim'
Submodule 'bundle/nerdtree' (https://github.com/scrooloose/nerdtree.git) registered for path 'vim/bundle/nerdtree'
Submodule 'bundle/supertab' (https://github.com/ervandew/supertab.git) registered for path 'vim/bundle/supertab'
Submodule 'vim/bundle/vim-flake8' (https://github.com/nvie/vim-flake8.git) registered for path 'vim/bundle/vim-flake8'
Submodule 'bundle/vim-pathogen' (https://github.com/tpope/vim-pathogen.git) registered for path 'vim/bundle/vim-pathogen'
Cloning into '/home/pockets/dotfiles/vim/bundle/jedi-vim'...
warning: redirecting to https://github.com/davidhalter/jedi-vim/
Cloning into '/home/pockets/dotfiles/vim/bundle/nerdtree'...
Cloning into '/home/pockets/dotfiles/vim/bundle/supertab'...
Cloning into '/home/pockets/dotfiles/vim/bundle/vim-flake8'...
Cloning into '/home/pockets/dotfiles/vim/bundle/vim-pathogen'...
Submodule path 'vim/bundle/jedi-vim': checked out '8cf616b0887276e026aefdf68bc0311b83eec381'
Submodule 'jedi' (https://github.com/davidhalter/jedi.git) registered for path 'vim/bundle/jedi-vim/jedi'
Cloning into '/home/pockets/dotfiles/vim/bundle/jedi-vim/jedi'...
Submodule path 'vim/bundle/jedi-vim/jedi': checked out 'f05c0714c701ab784bd344aa063acd216fb45ec0'
Submodule path 'vim/bundle/nerdtree': checked out '281701021c5001332a862da80175bf585d24e2e8'
Submodule path 'vim/bundle/supertab': checked out 'cdaa5c27c5a7f8b08a43d0b2e65929512299e33a'
Submodule path 'vim/bundle/vim-flake8': checked out '91818a7d5f5a0af5139e9adfedc9d00fa963e699'
Submodule path 'vim/bundle/vim-pathogen': checked out '7ba2e1b67a8f8bcbafedaf6763580390dfd93436'

That last command git submodule update --init --recursive looked at the submodule hashes stored in HEAD, and updated my working tree (it does not update them to the most recent commits in their respective repos; that's git submodule update --remote), adding the contents of the corresponding repositories at the corresponding paths, doing so recursively (so if any of my submodules have submodules, which they do, the contents of those repositories also get added to my working tree).

Now, it so happens that I updated my Vim plugins in HEAD~2:

$ git show HEAD~2 -- vim/bundle/*
commit 27bfe76851991026bd026b4bf2ab10d6ecbc6f74
Author: First Last <[email protected]>
Date:   Thu Feb 2 13:33:30 2017 -0600

    update dependencies

diff --git a/vim/bundle/jedi-vim b/vim/bundle/jedi-vim
index f191ccd..8cf616b 160000
--- a/vim/bundle/jedi-vim
+++ b/vim/bundle/jedi-vim
@@ -1 +1 @@
-Subproject commit f191ccd6fb7f3bc2272a34d6230487caf64face7
+Subproject commit 8cf616b0887276e026aefdf68bc0311b83eec381
diff --git a/vim/bundle/nerdtree b/vim/bundle/nerdtree
index eee431d..2817010 160000
--- a/vim/bundle/nerdtree
+++ b/vim/bundle/nerdtree
@@ -1 +1 @@
-Subproject commit eee431dbd44111c858c6d33ffd366cae1f17f8b3
+Subproject commit 281701021c5001332a862da80175bf585d24e2e8
diff --git a/vim/bundle/supertab b/vim/bundle/supertab
index 6651177..cdaa5c2 160000
--- a/vim/bundle/supertab
+++ b/vim/bundle/supertab
@@ -1 +1 @@
-Subproject commit 66511772a430a5eaad7f7d03dbb02e8f33c4a641
+Subproject commit cdaa5c27c5a7f8b08a43d0b2e65929512299e33a

Let's say that something seems to be weird with my Vim plugins right now, and I suspect that the above commit is responsible for this weirdness, so I want to roll back my plugins to whatever they were before I updated them.

$ git checkout -b testing HEAD~2^
M       vim/bundle/jedi-vim
M       vim/bundle/nerdtree
M       vim/bundle/supertab
Switched to a new branch 'testing'
$ git status
On branch testing
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   vim/bundle/jedi-vim (new commits)
        modified:   vim/bundle/nerdtree (new commits)
        modified:   vim/bundle/supertab (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

Alright, now stuff seems a little weird. I didn't have any changes in my working tree when I did this checkout, so it seems that checkout is introducing changes to my working tree? What's going on here?

$ git diff
diff --git a/vim/bundle/jedi-vim b/vim/bundle/jedi-vim
index f191ccd..8cf616b 160000
--- a/vim/bundle/jedi-vim
+++ b/vim/bundle/jedi-vim
@@ -1 +1 @@
-Subproject commit f191ccd6fb7f3bc2272a34d6230487caf64face7
+Subproject commit 8cf616b0887276e026aefdf68bc0311b83eec381
diff --git a/vim/bundle/nerdtree b/vim/bundle/nerdtree
index eee431d..2817010 160000
--- a/vim/bundle/nerdtree
+++ b/vim/bundle/nerdtree
@@ -1 +1 @@
-Subproject commit eee431dbd44111c858c6d33ffd366cae1f17f8b3
+Subproject commit 281701021c5001332a862da80175bf585d24e2e8
diff --git a/vim/bundle/supertab b/vim/bundle/supertab
index 6651177..cdaa5c2 160000
--- a/vim/bundle/supertab
+++ b/vim/bundle/supertab
@@ -1 +1 @@
-Subproject commit 66511772a430a5eaad7f7d03dbb02e8f33c4a641
+Subproject commit cdaa5c27c5a7f8b08a43d0b2e65929512299e33a

Huh, OK, so this is interesting. For some reason, git diff is saying that the versions of the submodules checked out in my current working tree don't match up - but this diff looks a lot like the git show I did above. I wonder what submodules are actually in HEAD...

$ git ls-tree HEAD -- vim/bundle/
160000 commit f191ccd6fb7f3bc2272a34d6230487caf64face7  vim/bundle/jedi-vim
160000 commit eee431dbd44111c858c6d33ffd366cae1f17f8b3  vim/bundle/nerdtree
160000 commit 66511772a430a5eaad7f7d03dbb02e8f33c4a641  vim/bundle/supertab
160000 commit 91818a7d5f5a0af5139e9adfedc9d00fa963e699  vim/bundle/vim-flake8
160000 commit 7ba2e1b67a8f8bcbafedaf6763580390dfd93436  vim/bundle/vim-pathogen

Aha! There we go - git checkout just didn't update the version of the submodule in the working tree! Git itself still knows what hashes these submodules should be checked out to, though. Turns out there's a command that will do this for you:

$ git submodule update --init --recursive
Submodule path 'vim/bundle/jedi-vim': checked out 'f191ccd6fb7f3bc2272a34d6230487caf64face7'
Submodule path 'vim/bundle/jedi-vim/jedi': checked out '2ba78ab725f1e02dfef8bc50b0204cf656e8ee23'
Submodule path 'vim/bundle/nerdtree': checked out 'eee431dbd44111c858c6d33ffd366cae1f17f8b3'
Submodule path 'vim/bundle/supertab': checked out '66511772a430a5eaad7f7d03dbb02e8f33c4a641'

To separately address your concerns:

  1. Break platform specific code away from common code so changes to common code and/or the SDK don't have unforeseen consequences on other platforms.

    Yep. Submodules.

  2. Create new repositories for each platform, each common code base and each SDK.

    This is going to depend a lot on how you've laid out your current codebase. It's possible that git subtree may have what you want (specifically git subtree split lets you extract files in a subtree into a separate Git repo, replete with commit history, but I don't know how well it'll work with a repo as large and as old as you seem to be describing - see man git subtree for more details).

  3. Cloning a remote repository to a local machine should be a one step process; new users should not be forced to pull each sub-repository before the platform will build.

    Nope. Submodules, by design, do not track the contents of a sub-repository: that's the sub-repository's job. All they do is keep a pointer to a specific commit in the sub-repo.

  4. In addition; restoring old code, including the common code and SDK that was used to build version XXX-YYY of any given platform, must be possible. Modifying the common code should not effect the platforms until the maintainer of the platform pulls in the new common code. (I believe this is possible with subtrees by adding one of the common code's tagged commits instead of the common code's master)

    Submodules do this by design. That's why they point to specific commits instead of a remote.

  5. A read-only sub-repository is an acceptable limitation. (i.e. the sub-repository either cannot be modified in the platform's repository or changes to the sub-repository cannot be pushed from the platform's repository)

    Submodules should generally be treated as read-only sub-repositories. Pushing updates from submodules is possible, but puts more overhead on the user for making sure they don't screw up the version of the submodule they're working with.

  6. SmartGit/HG support is desirable as all but one of our members uses it.

    No guarantees here. You'll probably have to reach out to all 3 development communities (Git, SmartGit, Mercurial) to figure this out.

  7. Scripting one-time tasks is acceptable, but I don't want to pollute platforms with scripts that do Git's job for it.

    Depends how complicated you're talking. I've shown above that checking out an old version of your code is just two commands: checkout and submodule update --init --recursive, but it's not clear what you're asking for.

like image 106
Pockets Avatar answered Sep 22 '22 14:09

Pockets