Possibility for git "overlays" (storing only differences to extern repositories in a local repository)?

Tags:

git

I would like to do something, best described in this mailing list post that I found:

git Archives: GIT overlay repositories (unsw.edu.au)

Start with two repositories, let's call them Repo-A and Repo-B. Repo-A is hosted on some server somewhere and contains lots of code (let's say its a kernel source repository). Repo-B is only adding a small amount of changes to the repo (for argument sake, let's say the IPW2100 and IPW2200 projects) on top of what is already provided by Repo-A.

For several reasons, we would like users to be able to get just the differences between Repo-A and Repo-B from me.

For example, the user gets the full Repo-A: [...]
and then overlays just the delta, which they obtain from me: [...]

The problem is, I just cannot find any other references to this concept (another thing frustrating the search efforts is that Gentoo has something called "git overlays" in its package manager; and TortoiseGIT has "overlay" icons). The thread itself seems to have only one reply, is from 2005, and it suggests the introduction of "ancestors file stored on the overlay repository", which was probably never implemented in git proper. While that posting actually includes bash scripts to demonstrate the concept, they are based on rsync-ing .git internals directly, which I don't really feel confident about testing.

My question is - is there a standard way (e.g. using mostly git commands, or shell scripts that would be called in context of git) in which this kind of operation can be achieved? Alternatively, are there some "filesystem overlay" tricks I could use under Linux, to achieve something to that effect?

I thought git submodules could be used, but apparently they can't; I prepared a small bash script to test that:

#!/usr/bin/env bash
set -x

rm -rf repoM-git

mkdir repoM-git
cd repoM-git
git init
git config user.name "me"
git config user.email "[email protected]"

git submodule add https://github.com/defunkt/github-gem.git repo1
git submodule add https://gist.github.com/6462971.git repo2

git status
git commit -m "initial checkin"

cd repo1
git config user.name "me"
git config user.email "[email protected]"
SOMETAG=$(git tag --list | awk 'NR==4{print $0;}')
{ echo "Checking out $SOMETAG in repo1"; } 2>/dev/null
git checkout $SOMETAG
{ echo "Creating myhack branch"; } 2>/dev/null
git checkout -b myhack
{ echo "Attempting to change"; } 2>/dev/null
echo "AHOOOOOY" >> README
git add -u
git status
{ echo "Commiting in submodule repo1..."; } 2>/dev/null
git commit -m "first change"
git status

{ echo "Going back to main repoM"; } 2>/dev/null
cd ..
git add -u
git status
git diff --cached

Running this script reports at end:

HEAD is now at b6df531... Bump the version to 0.1.3
Creating myhack branch
+ git checkout -b myhack
Switched to a new branch 'myhack'
Attempting to change
+ echo AHOOOOOY
+ git add -u
+ git status
# On branch myhack
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   README
#
Commiting in submodule repo1...
+ git commit -m 'first change'
[myhack 0e01195] first change
 1 file changed, 1 insertion(+)
+ git status
# On branch myhack
nothing to commit (working directory clean)
Going back to main repoM
+ cd ..
+ git add -u
+ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   repo1
#
+ git diff --cached
diff --git a/repo1 b/repo1
index 8ef0c30..0e01195 160000
--- a/repo1
+++ b/repo1
@@ -1 +1 @@
-Subproject commit 8ef0c3087d2e5d1f6fe328c06974d787b47df423
+Subproject commit 0e01195675f2e1585cdbdffb9fffb3cca2e5f547

This basically confirms that the submodule is it's own repo/work-area, with its own .git directory.; what I'd want instead, is that my "master" repository records the changes to any "child" repositories that may be included. For instance, in the example above, I'd want repoM to track not just that I've done a change in repo1, which originally is from elsewhere, in respect to its tag 'v0.1.3' (i.e., it's underlying SHA-1 commit hash) - but also record the changes (or the diff) themselves. Is this possible to do, with submodules or otherwise?

523

asked Feb 01 '15 14:02

sdaau

1 Answers

Git is already well-suited to what you want to do, even without any extensions.

Here is one way that I might maintain my own fork of an upstream repository, using GitHub's hub repo as an example:

Clone the upstream repository and rename its remote.
```
git clone [email protected]:github/hub.git
git remote rename origin upstream
```
At this point my repository will look something like this:
```
          D---E  [master][upstream/master]
         /
A---B---C  [tag:v1.12.4]
```
Note that I have included the most recent tag, v1.12.4, in my diagram. It's always a good idea to start working from a known state.
Get to a known state.

I'll work from one of hub's releases, so I need to move my master branch to the v1.12.4 tag before I start:
```
git reset --hard v1.12.4
```

Make some changes.

After a while, my repository may now look something like like this:

          D---E  [upstream/master]
         /
A---B---C  [tag:v1.12.4]
         \
          1---2---3 [master]

Publish.

Whenever you are ready, users can use your master branch, or any new tags you may commit, to retrieve your source code. Because the commits A, B and C exist in your repository and also in the upstream repository, somebody who has previously cloned the upstream repository can simply fetch your changes, perhaps into sdaau-master.

Update.

Your changes are relative to tag v1.12.4, but what happens when the upstream repository changes? Let's say they've released a new version v1.13 and you want to support that as well.

Easy: Just git fetch upstream to get the new changes...

                            I---J---K  [upstream/master]
                           /
          D---E---F---G---H  [tag:v1.13]
         /
A---B---C  [tag:v1.12.4]
         \
          1---2---3  [master]

...and merge them into your master branch with git merge v1.13:

                            I---J---K  [upstream/master]
                           /
          D---E---F---G---H  [tag:v1.13]
         /                 \
A---B---C  [tag:v1.12.4]    \
         \                   \
          1---2---3-----------4  [master]

Rinse and repeat.

                                              N---O [upstream/master]
                                             /
                            I---J---K---L---M  [tag:v1.13.1]
                           /                 \
          D---E---F---G---H  [tag:v1.13]      \
         /                 \                   \
A---B---C  [tag:v1.12.4]    \                   \
         \                   \                   \
          1---2---3-----------4---5---6---7-------8---9  [master]

Some benefits of this approach are listed below:

Throughout all of this, your changes remain in your own branch. Of course, you can create as many of your own branches and tag as many releases as you want. Depending on the complexity of the project, this is probably a good idea.
Your work remains linked to the upstream repository. You can update your code when the upstream project gets updated, and it becomes very easy for other users to pull in your changes.
You can contribute upstream. This configuration also lets you submit patches to the upstream project quite easily. You may do this through a GitHub "fork", with their proprietary pull requests, or using standard Git commands like bundle, format-patch, apply and am.
Explicit relationship. Looking at the network graphs, it becomes very clear that your work is your own, and that it is based upon the upstream project.

The only real drawback is bandwidth, which can be mitigated by hosting your repository on a service like GitHub, GitLab or Bitbucket.

134

answered Nov 15 '22 17:11

Chris

Related questions
                            
                                Can I push to git and force a maximum pack size?
                            
                                git checkout throws fatal reference is not a tree
                            
                                How to read Git 3-way unified diff output format?
                            
                                how does stashing work in git - internals
                            
                                Local Jenkins can't authenticate remote Gitlab private repository
                            
                                Multiple Git Users On Same Machine
                            
                                How to incorporate version control (Git) in a large Lotus Notes project
                            
                                Why use a git bare repository for website deployment?
                            
                                VS2013 git - Project marked as pending delete
                            
                                Generating git.properties file with git information
                            
                                Git has a long delay between git pull and asking for password
                            
                                How to share a GIT branch without history? (Using GitLab)
                            
                                Macosx git autocomplete shows deleted branches
                            
                                "unknown revision or path not in the working tree" when trying to fetch specific commit to new directory
                            
                                Make "git pull" ask for confirmation when pulling different branch
                            
                                jenkins hook not working - jenkins bitbucket
                            
                                Is Feature Branching still (or ever) considered a bad practice?
                            
                                Run Lint from console for only selected files
                            
                                How can I create a Docker image based on a git tag in the public Registry?
                            
                                Collaborate with colleagues who don’t use git

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With