Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change the root commit parent to point to another commit (connecting two independent git repositories)

I have a project that has more than 3 years of history in the svn repository. It was migrated to git, but the guy who did this, just take the last version and throw out all these 3 years of history.

Now the project has the last 3-4 months of history in one repository, and I've imported the other 3 years of svn history into a new git repository.

Is there some way to connect the root commit of the second repository into the last commit of the first one?

It is something like this:

  *   2017-04-21 - last commit on master
  |   
  *   2017-03-20 - merge branch Y into master
  |\  
  | * 2017-03-19 - commit on branch Y
  | | 
  * | 2017-03-18 - merge branch X into master
 /| * 2017-02-17 - commit on another new branch Y
* |/  2017-02-16 - commit on branch X
| *   2017-02-15 - commit on master branch
* |   2017-01-14 - commit on new branch X
 \|   
  *   2017-01-13 - first commit on new repository
  |   
  *   2017-01-12 - init new git project with the last version of the code in svn repository
  .   
  .   
There is no relationship between the two different repositories yet, this is what I wanna
do. I want to connect the root commit of 2nd repository with the last commit of the first
one.
  .
  .   
  *   2017-01-09 - commit
  |   
  *   2017-01-08 - commit
  |   
  *   2017-01-07 - merge
 /|   
* |   2016-01-06 - 2nd commit the other branch
| *   2016-01-05 - commit on trunk
* |   2016-01-04 - commit on new branch
 \|   
  *   2015-01-03 - first commit
  |   
  *   2015-01-02 - beggining of the project

Update:

I just learn that I need to do a git rebase, but how? Please, let's consider the commit dates like it was the SHA-1 codes... The answer was to use git filter-branch with --parent-filter option, not a git rebase.

Update 2:

I tried the command git filter-branch --parent-filter 'test $GIT_COMMIT = 443aec8880e898710796a1c4fb4decea1ca5ff66 && echo "-p 98e2b95e07b84ad1e40c3231e66840ea910e9d66" || cat' HEAD and it didn't work:

PS D:\git\rebase-test\rep2cc> git filter-branch --parent-filter 'test $GIT_COMMIT = 443aec8880e898710796a1c4fb4decea1ca5ff66 && echo "-p 98e2b95e07b84ad1e40c3231e66840ea910e9d66" || cat' HEAD
fatal: ambiguous argument '98e2b95e07b84ad1e40c3231e66840ea910e9d66 || cat': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Update 3:

It didn't work on Windows CMD or PowerShell, but it did work in Git Bash on windows.

like image 827
lmcarreiro Avatar asked May 19 '17 19:05

lmcarreiro


People also ask

How do I combine repositories?

To combine two separate Git repositories into one, add the repository to merge in as a remote to the repository to merge into. Then, combine their histories by merging while using the --allow-unrelated-histories command line option.


1 Answers

First things first: you need a single repo that has all the available history.

Make a clone of the repo with the recent history. Add the repo with the old history as a remote. I recommend this clone be a "mirror" and that you finish by replacing your origin repo with this one. But alternately you can leave --mirror off, and you'll finish by pushing (possibly force-pushing depending on which approach you use) all refs back to origin.

git clone --mirror url/of/current/repo
cd repo
git remote add history url/of/historical/repo
git fetch history

The next thing you need to do is figure out where you'll be splicing the history. The terminology to describe this is a bit fuzzy I think... what you want is to find the two commits that correspond to the most recent SVN revision for which both histories have a commit. For example your SVN repo contained versions 1, 2, 3, and 4. Now you have

Recent-History Repo

C --- D --- E --- F <--(master)

Old-History Repo

A --- B --- C' --- D'

where A represents version 1, B represents version 2, C and C' represent version 3, and D and D' represent version 4. E and F are work created after the original migration. So you want to splice the commits whose parent is D (E in this example) onto D'.

Now, I can think of two approaches, each with pros and cons.

Rewriting The Recent History

IMO the best way if you can coordinate a cut-over of all developers to a new repo (meaning you arrange a time when they all agree that all outstanding work is pushed, so they discard their clones; then you do the conversion; then they all re-clone) is to (effectively) rebase the recent history onto the old history.

If there is really just a single branch, then you can literally use rebase

git rebase --onto D' D master

(where D and D' are replaced with the SHA ID of the commits).

More likely you have some branches and merges in the recent history; in that case a rebase operation will start becoming a problem very quickly. On the other hand, you can take advantage of the fact that D has the same tree as D' -- so a rebase and a re-parent are more or less equivalent.

So you can use git filter-branch with a --parent-filter to do the rewrite. Based on the examples in the docs at https://git-scm.com/docs/git-filter-branch you would do something like

git filter-branch --parent-filter 'test $GIT_COMMIT = D && echo "-p D'" || cat' HEAD

(where again D and D' are replaced with the SHA ID of the commits).

This creates "backup" refs that you'll need to clean up. In the end you'll get

A --- B --- C' --- D' --- E' --- F' <--(master)

It's the fact that F was replace by F' which creates the need for a hard cut-over (more or less).

Now if you made a mirror clone back at step 1, you can consider wiping the reflog, dropping the remotes, and running gc, and then this is a new ready-to-use origin repo.

If you made a regular clone, then you'll need to push -f all the refs to the origin, and this will likely leave behind some clutter on the origin repo.

Using a "replacement commit"

The other option doesn't create a hard cut-over, but it leaves you with small headaches to deal with forever. You can use git replace. In your combined repo

git replace `D` `D'`

By default, when generating log output or whatever, if git finds D, it will substitute D' (and its history) in the output.

There are some known glitches. There may be unknown glitches. And by default the "replacement refs" that make this all work aren't shared, so you have to push and fetch them deliberately.

like image 172
Mark Adelsberger Avatar answered Sep 30 '22 06:09

Mark Adelsberger