Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git: Copy history of file from one repository to another [duplicate]

Tags:

git

I've two git repositories say A and B, both contains a file named file1.cc. Is it possible to merge/copy the history of file1.cc in repo A to file1.cc in repo B?

The problem is we've already moved the files from repo A to repo B and the history of all the files are lost. but now some of the developers already started working on the repo B and pushed their changes. So now I want merge/copy history of some files from repo A to repo B and which are applicable only for some of the files. Is it possible to do so? Or the history of the files once lost is lost forever?

Please help. Thanks in advance.

like image 425
Paul Varghese Avatar asked Jun 27 '17 09:06

Paul Varghese


People also ask

Does git clone copy history?

Cloning an entire repo is standard operating procedure using Git. Each clone usually includes everything in a repository. That means when you clone, you get not only the files, but every revision of every file ever committed, plus the history of each commit.

How do you copy the contents of a repo to another repo?

You first have to get the original Git repository on your machine. Then, go into the repository. Finally, use the --mirror flag to copy everything in your local Git repository into the new repo.

How do I copy files from one Github repository to another?

Navigate to the repository you just cloned. Pull in the repository's Git Large File Storage objects. Mirror-push to the new repository. Push the repository's Git Large File Storage objects to your mirror.


1 Answers

It can be done, but it may not be easy. But first things first: there is no "moving the history of a file". There is only moving commits, so if you want commits that represent the history of a subset of files, then creating those commits is the first challenge.

The simplest thing would be to transfer all history. (In fact, if it happens that you made Repo B as a shallow clone of Repo A, then you could just un-shallow it and be done. But I'm guessing that's not how you created Repo B...)

Regardless, since you're moving from Repo A to Repo B, maybe there's some history you specifically want to remove. That's potentially a whole topic of its own, but let's just assume you really want only the history of a few files.

In the special case where all the files you want (and no others) are in a subdirectory, and you want (or, at least, can accept) to move those files to the repo's root directory, you can use filter-branch with the --subdirectory-filter.

More generally, if we assume paths shouldn't change and that the files you want could be anywhere in the tree, then you could use filter-branch with an --index-filter.

git filter-branch --index-filter 'git rm --cached --ignore-unmatch each file or *glob* you do NOT want' --prune-empty -- all

That could take a while if the repo had a lot of commits. If the list of files to rm is not trivial, you may want to put multiple git rm commands in a shell script and use that as the --index-filter argument instead of inlining it as shown above.

Well, one way or other hopefully you've got a history you'd like to graft into Repo B.

cd repo-b
git remote add repo-a path/to/repo-a
git fetch repo-a

Now you have in Repo B:

... A -- B <--(repo-a/master)
  \
   (repo-a/other-branches-maybe)

B' -- C -- D (master)(origin/master)

So I'm making an assumption here, that the TREE from the last master commit in Repo A - the one from which our history rewrite created B - or at least some part of that tree, was imported as the root commit in Repo B.

Now you have three options: re-parent, rebase, or replace

Since I assume the recent history state is more important than the older-history state, and that the older history is just being added for reference, the safest thing would be to reparent C to B. (You could choose to reparent B' to A instead, but I'm assuming that doesn't make much difference...)

So drawing from the filter-branch docs at https://git-scm.com/docs/git-filter-branch you could

# be sure you're on master
echo "$commit-id $graft-id" >> .git/info/grafts
git filter-branch $graft-id..HEAD

where $commit-id is the SHA for B and $graft-id is the SHA for C

A rebase might be a little simpler (assuming a certain level of consistency between the histories) but introduces the possibility that you end up modifying the tree at D. If you do decide to try a rebase, it would be

git rebase --onto repo-A/master B' master

where B' is the Repo B root commit's SHA ID. (Alternately

git rebase --interactive --onto repo-A/master --root master

and then drop the entry for B'.)

Either of these options will rewrite commits C and D. (Even though re-parenting ensures the TREE is unchanged, the commits are still replaced.) Your developers would have to treat this as an upstream rebase (see the git rebase documentation under "recovering from upstream rebase"). To mitigate this, I generally recommend doing a coordinated cut-over where devs check in everything they have, discard their clones, then you do the rewrite and they re-clone from the new repo.

If you want to avoid the rewrite, you can use the third option: git replace. This is known to have a few quirks, and it requires each clone to be set up correctly in order to "see" the spliced history.

So to support this, you'd just tag B (and maybe also B'):

git tag old-history repo-a/master
git tag new-root B'

(where B' is the appropriate SHA value ID, or equivalent expression).

When someone clones the repo, they'll see only the new history, but they can say

git replace new-root old-history

and this will paper over the break in history.

Once you've done your reparent, rebase, or replace - you can remove the repo-a remote.

like image 187
Mark Adelsberger Avatar answered Oct 15 '22 15:10

Mark Adelsberger