Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge git repository in subdirectory

Tags:

git

I'd like to merge a remote git repository in my working git repository as a subdirectory of it. I'd like the resulting repository to contain the merged history of the two repositories and also that each file of the merged-in repository retain its history as it was in the remote repository. I tried using the subtree strategy as mentioned in How to use the subtree merge strategy, but after following that procedure, although the resulting repository contains indeed the merged history of the two repositories, individual files coming from the remote one haven't retained their history (`git log' on any of them just shows a message "Merged branch...").

Also I don't want to use submodules because I do not want the two combined git repositories to be separate anymore.

Is it possible to merge a remote git repository in another one as a subdirectory with individual files coming from the remote repository retaining their history?

Thanks very much for any help.

EDIT: I'm currently trying out a solution that uses git filter-branch to rewrite the merged-in repository history. It does seem to work, but I need to test it some more. I'll return to report on my findings.

EDIT 2: In hope I make myself more clear I give the exact commands I used with git's subtree strategy, which result in apparent loss of history of the files of the remote repository. Let A be the git repo I'm currently working in and B the git repo I'd like to incorporate into A as a subdirectory of it. It did the following:

git remote add -f B <url-of-B> git merge -s ours --no-commit B/master git read-tree --prefix=subdir/Iwant/to/put/B/in/ -u B/master git commit -m "Merge B as subdirectory in subdir/Iwant/to/put/B/in." 

After these commands and going into directory subdir/Iwant/to/put/B/in, I see all files of B, but git log on any one of them shows just the commit message "Merge B as subdirectory in subdir/Iwant/to/put/B/in." Their file history as it is in B is lost.

What seems to work (since I'm a beginner on git I may be wrong) is the following:

git remote add -f B <url-of-B> git checkout -b B_branch B/master  # make a local branch following B's master git filter-branch --index-filter \     'git ls-files -s | sed "s-\t\"*-&subdir/Iwant/to/put/B/in/-" |         GIT_INDEX_FILE=$GIT_INDEX_FILE.new \                 git update-index --index-info &&         mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD  git checkout master git merge B_branch 

The command above for filter-branch is taken from git help filter-branch, in which I only changed the subdir path.

like image 714
christosc Avatar asked Jun 21 '11 13:06

christosc


People also ask

Can I merge two git repositories?

Combining two git repositories. Use case: You have repository A with remote location rA, and repository B (which may or may not have remote location rB). You want to do one of two things: preserve all commits of both repositories, but replace everything from A with the contents of B, and use rA as your remote location.

Can I clone a subdirectory in git?

Cloning only a subdirectory is not possible in Git. The network protocol doesn't support it, the storage format doesn't support it. Every single answer to this question always clones the whole repository.


2 Answers

git-subtree is a script designed for exactly this use case of merging multiple repositories into one while preserving history (and/or splitting history of subtrees, though that is seems to be irrelevant to this question). It is distributed as part of the git tree since release 1.7.11.

To merge a repository <repo> at revision <rev> as subdirectory <prefix>, use git subtree add as follows:

git subtree add -P <prefix> <repo> <rev> 

git-subtree implements the subtree merge strategy in a more user friendly manner.

The downside is that in the merged history the files are unprefixed (not in a subdirectory). Say you merge repository a into b. As a result git log a/f1 will show you all the changes (if any) except those in the merged history. You can do:

git log --follow -- f1 

but that won't show the changes other then in the merged history.

In other words, if you don't change a's files in repository b, then you need to specify --follow and an unprefixed path. If you change them in both repositories, then you have 2 commands, none of which shows all the changes.

More on it here.

like image 108
kynan Avatar answered Oct 10 '22 13:10

kynan


After getting the fuller explanation of what is going on, I think I understand it and in any case at the bottom I have a workaround. Specifically, I believe what is happening is rename detection is being fooled by the subtree merge with --prefix. Here is my test case:

mkdir -p z/a z/b cd z/a git init echo A>A git add A git commit -m A echo AA>>A git commit -a -m AA cd ../b git init echo B>B git add B git commit -m B echo BB>>B git commit -a -m BB cd ../a git remote add -f B ../b git merge -s ours --no-commit B/master git read-tree --prefix=bdir -u B/master git commit -m "subtree merge B into bdir" cd bdir echo BBB>>B git commit -a -m BBB 

We make git directories a and b with several commits each. We do a subtree merge, and then we do a final commit in the new subtree.

Running gitk (in z/a) shows that the history does appear, we can see it. Running git log shows that the history does appear. However, looking at a specific file has a problem: git log bdir/B

Well, there is a trick we can play. We can look at the pre-rename history of a specific file using --follow. git log --follow -- B. This is good but isn't great since it fails to link the history of the pre-merge with the post-merge.

I tried playing with -M and -C, but I wasn't able to get it to follow one specific file.

So, the solution, I feel, is to tell git about the rename that will be taking place as part of the subtree merge. Unfortunately git-read-tree is pretty fussy about subtree merges so we have to work through a temporary directory, but that can go away before we commit. Afterwards, we can see the full history.

First, create an "A" repository and make some commits:

mkdir -p z/a z/b cd z/a git init echo A>A git add A git commit -m A echo AA>>A git commit -a -m AA 

Second, create a "B" repository and make some commits:

cd ../b git init echo B>B git add B git commit -m B echo BB>>B git commit -a -m BB 

And the trick to making this work: force Git to recognize the rename by creating a subdirectory and moving the contents into it.

mkdir bdir git mv B bdir git commit -a -m bdir-rename 

Return to repository "A" and fetch and merge the contents of "B":

cd ../a git remote add -f B ../b git merge -s ours --no-commit B/master # According to Alex Brown and pjvandehaar, newer versions of git need --allow-unrelated-histories # git merge -s ours --allow-unrelated-histories --no-commit B/master git read-tree --prefix= -u B/master git commit -m "subtree merge B into bdir" 

To show that they're now merged:

cd bdir echo BBB>>B git commit -a -m BBB 

To prove the full history is preserved in a connected chain:

git log --follow B 

We get the history after doing this, but the problem is that if you are actually keeping the old "b" repo around and occasionally merging from it (say it is actually a third party separately maintained repo) you are in trouble since that third party will not have done the rename. You must try to merge new changes into your version of b with the rename and I fear that will not go smoothly. But if b is going away, you win.

like image 31
Seth Robertson Avatar answered Oct 10 '22 15:10

Seth Robertson