Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possibility for git "overlays" (storing only differences to extern repositories in a local repository)?

Tags:

git

I would like to do something, best described in this mailing list post that I found:

git Archives: GIT overlay repositories (unsw.edu.au)

Start with two repositories, let's call them Repo-A and Repo-B. Repo-A is hosted on some server somewhere and contains lots of code (let's say its a kernel source repository). Repo-B is only adding a small amount of changes to the repo (for argument sake, let's say the IPW2100 and IPW2200 projects) on top of what is already provided by Repo-A.

For several reasons, we would like users to be able to get just the differences between Repo-A and Repo-B from me.

For example, the user gets the full Repo-A: [...]
and then overlays just the delta, which they obtain from me: [...]

The problem is, I just cannot find any other references to this concept (another thing frustrating the search efforts is that Gentoo has something called "git overlays" in its package manager; and TortoiseGIT has "overlay" icons). The thread itself seems to have only one reply, is from 2005, and it suggests the introduction of "ancestors file stored on the overlay repository", which was probably never implemented in git proper. While that posting actually includes bash scripts to demonstrate the concept, they are based on rsync-ing .git internals directly, which I don't really feel confident about testing.

My question is - is there a standard way (e.g. using mostly git commands, or shell scripts that would be called in context of git) in which this kind of operation can be achieved? Alternatively, are there some "filesystem overlay" tricks I could use under Linux, to achieve something to that effect?

I thought git submodules could be used, but apparently they can't; I prepared a small bash script to test that:

#!/usr/bin/env bash
set -x

rm -rf repoM-git

mkdir repoM-git
cd repoM-git
git init
git config user.name "me"
git config user.email "[email protected]"

git submodule add https://github.com/defunkt/github-gem.git repo1
git submodule add https://gist.github.com/6462971.git repo2

git status
git commit -m "initial checkin"

cd repo1
git config user.name "me"
git config user.email "[email protected]"
SOMETAG=$(git tag --list | awk 'NR==4{print $0;}')
{ echo "Checking out $SOMETAG in repo1"; } 2>/dev/null
git checkout $SOMETAG
{ echo "Creating myhack branch"; } 2>/dev/null
git checkout -b myhack
{ echo "Attempting to change"; } 2>/dev/null
echo "AHOOOOOY" >> README
git add -u
git status
{ echo "Commiting in submodule repo1..."; } 2>/dev/null
git commit -m "first change"
git status

{ echo "Going back to main repoM"; } 2>/dev/null
cd ..
git add -u
git status
git diff --cached

Running this script reports at end:

HEAD is now at b6df531... Bump the version to 0.1.3
Creating myhack branch
+ git checkout -b myhack
Switched to a new branch 'myhack'
Attempting to change
+ echo AHOOOOOY
+ git add -u
+ git status
# On branch myhack
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   README
#
Commiting in submodule repo1...
+ git commit -m 'first change'
[myhack 0e01195] first change
 1 file changed, 1 insertion(+)
+ git status
# On branch myhack
nothing to commit (working directory clean)
Going back to main repoM
+ cd ..
+ git add -u
+ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   repo1
#
+ git diff --cached
diff --git a/repo1 b/repo1
index 8ef0c30..0e01195 160000
--- a/repo1
+++ b/repo1
@@ -1 +1 @@
-Subproject commit 8ef0c3087d2e5d1f6fe328c06974d787b47df423
+Subproject commit 0e01195675f2e1585cdbdffb9fffb3cca2e5f547

This basically confirms that the submodule is it's own repo/work-area, with its own .git directory.; what I'd want instead, is that my "master" repository records the changes to any "child" repositories that may be included. For instance, in the example above, I'd want repoM to track not just that I've done a change in repo1, which originally is from elsewhere, in respect to its tag 'v0.1.3' (i.e., it's underlying SHA-1 commit hash) - but also record the changes (or the diff) themselves. Is this possible to do, with submodules or otherwise?

like image 523
sdaau Avatar asked Feb 01 '15 14:02

sdaau


People also ask

What is overlay in Git?

The In Git overlay is used to represent an item which is in the normal state. The assume-valid (Needs Lock in TortoiseSVN) overlay is used to indicate if a file has the assume-valid flag set. The skip-worktree (Locked in TortoiseSVN) overlay is used when to indicate if a file has the skip-worktree flag set.

Can you have multiple repositories in Git?

With Git, using multiple repositories is the only way to work efficiently. This enables each team to work independently, and do their work faster. You can also make sure that developers only have access to the repositories they need access to (thus making Git more secure.)


1 Answers

Git is already well-suited to what you want to do, even without any extensions.

Here is one way that I might maintain my own fork of an upstream repository, using GitHub's hub repo as an example:

  1. Clone the upstream repository and rename its remote.

    git clone [email protected]:github/hub.git
    git remote rename origin upstream
    

    At this point my repository will look something like this:

              D---E  [master][upstream/master]
             /
    A---B---C  [tag:v1.12.4]
    

    Note that I have included the most recent tag, v1.12.4, in my diagram. It's always a good idea to start working from a known state.

  2. Get to a known state.

    I'll work from one of hub's releases, so I need to move my master branch to the v1.12.4 tag before I start:

    git reset --hard v1.12.4
    
  3. Make some changes.

    After a while, my repository may now look something like like this:

              D---E  [upstream/master]
             /
    A---B---C  [tag:v1.12.4]
             \
              1---2---3 [master]
    
  4. Publish.

    Whenever you are ready, users can use your master branch, or any new tags you may commit, to retrieve your source code. Because the commits A, B and C exist in your repository and also in the upstream repository, somebody who has previously cloned the upstream repository can simply fetch your changes, perhaps into sdaau-master.

  5. Update.

    Your changes are relative to tag v1.12.4, but what happens when the upstream repository changes? Let's say they've released a new version v1.13 and you want to support that as well.

    Easy: Just git fetch upstream to get the new changes...

                                I---J---K  [upstream/master]
                               /
              D---E---F---G---H  [tag:v1.13]
             /
    A---B---C  [tag:v1.12.4]
             \
              1---2---3  [master]
    

    ...and merge them into your master branch with git merge v1.13:

                                I---J---K  [upstream/master]
                               /
              D---E---F---G---H  [tag:v1.13]
             /                 \
    A---B---C  [tag:v1.12.4]    \
             \                   \
              1---2---3-----------4  [master]
    
  6. Rinse and repeat.

                                                  N---O [upstream/master]
                                                 /
                                I---J---K---L---M  [tag:v1.13.1]
                               /                 \
              D---E---F---G---H  [tag:v1.13]      \
             /                 \                   \
    A---B---C  [tag:v1.12.4]    \                   \
             \                   \                   \
              1---2---3-----------4---5---6---7-------8---9  [master]
    

Some benefits of this approach are listed below:

  • Throughout all of this, your changes remain in your own branch. Of course, you can create as many of your own branches and tag as many releases as you want. Depending on the complexity of the project, this is probably a good idea.

  • Your work remains linked to the upstream repository. You can update your code when the upstream project gets updated, and it becomes very easy for other users to pull in your changes.

  • You can contribute upstream. This configuration also lets you submit patches to the upstream project quite easily. You may do this through a GitHub "fork", with their proprietary pull requests, or using standard Git commands like bundle, format-patch, apply and am.

  • Explicit relationship. Looking at the network graphs, it becomes very clear that your work is your own, and that it is based upon the upstream project.

The only real drawback is bandwidth, which can be mitigated by hosting your repository on a service like GitHub, GitLab or Bitbucket.

like image 134
Chris Avatar answered Nov 15 '22 17:11

Chris