Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Git subtree merge strategy or subtree command?

I'm starting a new Zend Framework project in which I will collaborate with a designer. I'm going to maintain this project code using git and usually designers don't speak git (or any programming language) so I wanna make things easy for him, otherwise I'm afraid he won't use git at all. My plan is to give him some Git gui and with that he should use only basic git features such as commit, diff, fetch, merge, push and pull.

I'm using gitolite to maintain the shared copy of our git repository and since it has a granular permission system, I will give the designer RW access only for a dedicated branch (design) and read access to other branches.

To keep things simple I'd like to share with him only some of the folders in the main project (which follows ZF recommended structure) for which he does need access for doing its job. At the same time I want that both of us can still merge from each other.

The reduced structure for his branch should be this:

<project name>/

I know that I could use submodules for this task, but it will be a pain to maintain because I should split my project in (at least) 4 subrepository, he should have access only to subrepositories and he'd have 3 repository to work with. For this reason if this is the only solution I'll give up with this idea.

Some links I've already read that make me think that what I'm asking is possible:

  • git subtree command
  • git subtree merge strategy

Here are my question:

  1. How to create the reduced branch design (git checkout -b design and git mv/rm?)
  2. How to configure git to keep track of edits across branches (so I can git merge design from the master branch and vice versa)


I found out another possible approach to the problem given by these two SO questions

  • How do I tell git to always select my local version for conflicted merges on a specific file?
  • How to setup a git driver to ignore a folder on merge

I tried to implement the first after git rm all-unneeded-stuff in the design branch, I make a commit in the master branch which involves a file in the whitelisted paths and another file in the blacklisted paths, but git merge fail with the following message

CONFLICT (delete/modify): application/Bootstrap.php deleted in HEAD and modified in master. Version master of application/Bootstrap.php left in tree.

Then I added a new dir in the master branch and when merging from design the new dir is added. I put some debug echo in the driver and I saw that it hasn't be called in both cases, maybe because it's not a real merge.

I haven't tried the second approach (the .gitignore one) yet, but if I've understood the approach doesn't fit my needs because it will only ignore blacklisted files in the design branch, but they will be checked out in the design branch, breaking my requirements.

I pushed my experiments on GitHub

Update 2:

I think that currently there is no solution for that. With the current git implementation this is simply not achievable.

I'd like to be contradicted, but I'm afraid that it won't happen.

like image 606
Fabio Avatar asked Jul 05 '11 21:07


1 Answers

Sounds like you want to be able to restrict read access on a per-directory basis. This is possible, but the only solution I'm aware of is far from simple. It involves multiple versions of the same repository on your server, each kept in sync using some complicated hook magic to filter out the subdirectories.

I'm working on implementing the hooks in my spare time with the eventual goal of publishing them as open source software (perhaps as a feature addition to gitolite), but unfortunately my spare time is limited.


The general solution involves at least three variants of the same repository: One authority repository that coordinates two or more delegate repositories. Users never clone the authority repository; only delegate repositories are cloned.

The delegates are responsible for forwarding incoming commits to the authority repository. The authority repository is responsible for filtering the incoming commits appropriately for each other delegate repository. The results are then pushed down to the other delegates.

The authority repository isn't strictly required—delegates could perform the filtering on their own and then push the results directly to the other delegates—but using another repository as a centralized coordinator simplifies implementation considerably.

Delegate Repositories

Each delegate repository contains a subset of the entire project's data (e.g., zero or more subdirectories filtered out). All delegate repositories are identical to each other except each delegate has a different set of files filtered out. They all have the same commit history graph, but the commits will have different file contents and thus different SHA1 identifiers. They have the same set of branches and tags (in other words, if the project has a master branch, then each delegate repository also has a master branch), but because the SHA1 identifiers for the equivalent commits are different, the references will point to different SHA1 identifiers.

For example, the following are graphs of the contents of two delegate repositories. The everything.git repository doesn't have anything filtered out, but the no-foo.git repository has everything in subdirectory foo filtered out.

$ cd ~git/repositories/everything.git
$ git log --graph --oneline --decorate --date-order --all
* 2faaad9 (HEAD, master) barbaz
| * c3eb6a9 (release) foobar
* |   8b56913 Merge branch 'release'
|\ \  
| |/  
| * b8f899c qux
* | aad30f1 baz
* f4acd9f put a new file in subdirectory bar
* 2a15586 put a new file in subdirectory foo

$ cd ~git/repositories/no-foo.git
$ git log --graph --oneline --decorate --date-order --all
* 81c2189 (HEAD, master) barbaz
| * 6bbd85f (release) foobar
* |   c579c4b Merge branch 'release'
|\ \  
| |/  
| * 42c45c7 qux
* | 90ecdc7 baz
* 4d1cd8d put a new file in subdirectory bar
* 9cc719d put a new file in subdirectory foo

Notice that the two graphs look the same, have the same commit messages, the same branch names, etc. The only difference is the SHA1 IDs due to the fact that the file contents are different.

(Side note: Commits can be filtered out as well to prevent users of another delegate from even knowing that a commit in a filtered-out directory was made. However, a commit can only be filtered out if it only touches files in a filtered-out directory. Otherwise, there would be merge conflicts that could not be automatically resolved by the hooks.)

Authority Repository

The authority repository is a superset of all of the delegate authorities. All commit objects in each delegate repository are automatically pushed into the authority repository via a hook in each delegate repository. Thus, if there are two delegate repositories, there will be two isomorphic DAGs (one from each delegate) in the authority repository (assuming the delegates don't share a common root commit).

The authority repository will also have a version of each project branch from each delegate, prefixed by the name of the delegate. Continuing the above example, the everything.git delegate repository has a master branch pointing to commit 2faaad9, while delegate no-foo.git has a master branch pointing to the filtered-but-otherwise-equivalent commit 81c2189. In this scenario, authority.git would have two master branches: everything/master pointing to 2faaad9 and no-foo/master pointing to 81c2189. The following graph illustrates this.

$ cd ~git/repositories/authority.git
$ git log --graph --oneline --decorate --date-order --all
* 2faaad9 (everything/master) barbaz
| * 81c2189 (no-foo/master) barbaz
| | * c3eb6a9 (everything/release) foobar
| | | * 6bbd85f (no-foo/release) foobar
* | | |   8b56913 Merge branch 'release'
|\ \ \ \  
| | |/ /  
| |/| |   
| | * |   c579c4b Merge branch 'release'
| | |\ \  
| | | |/  
| * | | b8f899c qux
| | | * 42c45c7 qux
* | | | aad30f1 baz
|/ / /  
| * | 90ecdc7 baz
| |/  
* | f4acd9f put a new file in subdirectory bar
| * 4d1cd8d put a new file in subdirectory bar
* | 2a15586 put a new file in subdirectory foo
* 9cc719d put a new file in subdirectory foo

Notice that there are two versions of each commit, one for each delegate. Also notice the branch names.


Delegate Repositories

Each delegate feeds commits to the authority repository.

When a user updates a reference (via git push) in a delegate repository, that repository's update hook automatically does a git push into the authority repository. However, instead of using the standard push refspec, it uses a refspec that causes the reference in the authority's repository to be prefixed by the delegate repository's name (e.g., if the delegate repository is named foo.git then it will use push refspecs like +refs/heads/master:refs/heads/foo/master and +refs/tags/v1.0:refs/tags/foo/v1.0).

Authority Repository

The authority repository filters incoming commits and pushes them down into the other delegate repositories.

When a delegate repository pushes into the authority repository, the authority's update hook:

  1. Checks to see if the user is trying to create a file in one of the filtered-out directories. If so, it exits with an error (otherwise there could be merge conflicts which can't be resolved automatically).
  2. Grafts back in the subdirectories that were originally filtered out to form a tree that has nothing filtered out.
  3. For each other delegate, filter the unfiltered tree to make an equivalent commit with the appropriate contents removed.
  4. Push the equivalent commits to the delegate repositories.

Care must be taken to avoid race conditions between delegate repositories and to properly handle errors.

Your Case

In your example, you would have two delegate repositories like this:

  • everything.git (for you)
  • zend-project.git (for your designer)

Branches in authority.git would be prefixed by everything and zend-project corresponding to the two delegate repositories.

When you push to master in everything.git, the following would happen:

  1. The update hook in everything.git would push the incoming commits to the everything/master branch in authority.git.
  2. For each incoming commit, the update hook in authority.git would:
    1. Create a new tree object that is 100% identical to the commit's tree but remove everything outside of the application and public subdirectories.
    2. Create a new commit object using the new tree and equivalent parent(s), but reuse the original commit message, author, and timestamp.
    3. Update zend-project/master to point to the new commit.
  3. Push zend-project/master in authority.git to master in zend-project.git.

When your designer pushes to master in zend-project.git, the following would happen:

  1. The update hook in zend-project.git would push the incoming commits to the zend-project/master branch in authority.git.
  2. For each incoming commit, the update hook in authority.git would:
    1. Check to see if any new files were created outside the application or public subdirectories. If so, return with an error message.
    2. Create a new tree object that is 100% identical to the commit's tree except with the other subdirectories from everything/master grafted in.
    3. Create a new commit object using the new tree and equivalent parent(s), but reuse the original commit message, author, and timestamp.
    4. Update everything/master to point to the new commit.
  3. Push everything/master in authority.git to master in everything.git.


The above describes a way to implement per-directory read access control. It should be suitable if you really don't want certain users to be able to access parts of the repository. In your case, convenience for your designer may be more important than limiting access. If so, there may be a simpler way to accomplish what you want.

I hope I was able to explain this clearly enough.

like image 182
Richard Hansen Avatar answered Oct 05 '22 10:10

Richard Hansen