Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two completely different repositories

I have a git repository (let's call it A) which has quite some commits and tags in it.

I recently created a new repository (let's call this one B) and I did some commits in there (no tags, no branches other than master). After some work I realized that work in B could completely override A.

Is there some way to "merge" both repositories in such a way that no file from A would be preserved after the merging commit (but they would still exist before that commit), and the entire history of B would be preserved?

Graphical (kind of) example (for the sake of this example, think about git commits as if they were svn commits/numbers):

Repo A at commit 20:

foo.txt <-- 4 bytes
bar.txt <-- 2 bytes

Repo B at commit 14:

foo.txt <-- 3 bytes
cat.txt <-- 1 byte

----Merge operation----

Repo A after merge, commit 34:

foo.txt <-- 3 bytes
cat.txt <-- 1 byte

Extras: Repository A is a github hosted git repo, while B exists only in my dev machine.

like image 937
alexandernst Avatar asked Mar 13 '16 16:03

alexandernst


People also ask

Can we merge two different repositories?

To combine two separate Git repositories into one, add the repository to merge in as a remote to the repository to merge into. Then, combine their histories by merging while using the --allow-unrelated-histories command line option.

Can we merge different branches from different repositories in a single branch?

Adaptions. If you already have a local copy of a repository containing the code, you could also use that for the merging process. You just need to make sure you do not mix up local development and merging and maybe you need to handle more remote repositories as you have some more for your development.


2 Answers

[Edit, 28 Oct 2016: since Git version 2.9, released in mid-June 2016, you must add the flag --allow-unrelated-histories to your merge command to let Git attempt this kind of merge in the first place. The rest of this otherwise still applies.]

If I understand what you want correctly, it's not only possible, it's really quite trivial. But I may not understand correctly, so read the below carefully. There's a lot of explanation and a slow setup to do it the hard way (which lets you inspect everything as you go). Then, at the end, there's a single command to do everything all at once (assuming you've set up the remote and done the git fetch first, that is).

Git's commit DAGs

Git is quite different from most other version control systems. It operates on (and using) a commit graph, which is simply any Directed Acyclic Graph (or DAG).

A typical DAG starts with a single root, and has branches and merges such as:

        o - o - o
      /           \
o - o - o - o - o - X   <-- master
      \
        o - o - o       <-- topic

(this looks a bit like a hamburger, so let's call it "the hamburger repo"—I'll explain why there is one commit marked X later), or:

o - o - o               <-- A
       \
        o - o - Y       <-- B

(let's call this "the AB repo", and again the reason for the Y is explained later).

However, git allows entirely disconnected ("disjoint") sub-graphs:

o - o - o               <-- A
       \
        o - o - Y       <-- B

        o - o - o
      /           \
o - o - o - o - o - X   <-- master
      \
        o - o - o       <-- topic

Git "remotes"

To take an existing repository like the AB repo, and add another, different repository to its graph, simply add the different repository as a remote and use git fetch. For instance, starting with the AB repo as your current repository, you can git remote add hamburger <url> to add the hamburger repository as a "remote". At this point, running git fetch hamburger will bring over all the hamburger commits. Since they are not related to the AB-repo commits, they will be inserted as a disjoint subgraph. Git will also rename the branch labels in its usual way, so that master becomes hamburger/master and so on. In other words, the actual repository at this point looks like this:

o - o - o               <-- A
       \
        o - o - Y       <-- B

        o - o - o
      /           \
o - o - o - o - o - X   <-- hamburger/master
      \
        o - o - o       <-- hamburger/topic

Identifying commits for merges, and --first-parent

You may now "merge" any of the commits in this graph, by getting onto a local branch pointing to the desired commit. Let's say, for instance, that you want to create a new local branch named master that ties together the hamburger/master branch—i.e., commit X—and the B branch, i.e., commit Y, disregarding all the other commits for a moment.

First, we need to create the branch, pointing to either X or Y. We must choose one of the two. For the purpose of doing the merge itself, which one we choose doesn't matter, but for the purpose of following the history later, it does matter. Which is the right one? The answer depends on what you want to see later.

Git has the concept of following the "first parent" (using a flag spelled --first-parent) when looking at the history of a branch. While git itself doesn't care which is first and which is not, us humans tend to want to know which one was the "main" branch and which was the "side" branch being merged-in. The --first-parent is meant to let us see only the "main" branch, and graphical log viewers like gitk will draw the "main" branch as a continuous straight line while having the "side" branch, branch off (see, e.g., this image in this SO question).

If you want B, and commit Y, to look like the "main" branch, we should check out a branch pointing to commit Y. If you want master, and commit X, to look like the "main" branch, we should check out a branch pointing to commit X. (Now you know why we labeled these commits X and Y!) We already have such a branch for commit Y—it's local branch B—but we don't have one for X yet; it has only the name hamburger/master pointing to it, and that name is a "remote branch", not a regular local branch.

New commits (merges or regular) go on (local) branches

In either case, we can—and if you're new to git and not familiar with all the ways to recover from errors, should—use a new local branch to do this merge. So let's get a new local branch, pointing to either commit X:

git checkout -b for-merge hamburger/master

or commit Y:

git checkout -b for-merge B

(remember, remote-branch hamburger/master points to commit X, and local branch B points to commit Y: we saw these when we drew the graph). If you prefer, you can put in the actual SHA-1 hash for the commit. Git is just going to turn the name hamburger/master or B into the appropriate SHA-1 hash anyway.

Most likely, you want the main (first-parent) branch to follow branch B's history, so we want git checkout -b for-merge B. (In fact, in your repository, it's probably not named B, it's probably master. Note that it's quite OK to have both master and the unrelated hamburger/master: this is why git fetch renames branches.)

Doing the (special) merge

Now that we're on this for-merge branch, we can do the merge, but per your question, we don't want a normal merge at all. In fact, a normal merge will mainly just get in the way, as there is no merge base. What git does in this case is to use an empty tree as the merge base, so you tend to get a lot of create/create conflicts. So what we may want to do in the end is use an internal (not for normal everyday use) git command, git commit-tree, to make our new commit.

Before we get there, though, let's see how we'd do this with the normal merge command.

First, just in case it actually works, we don't want git to commit the merge, so let's use --no-commit. Then, the only other thing we need to do is point git merge at the commit to be merged. This is most likely commit X, which we can name by its actual SHA-1, or by the name hamburger/master:

git merge --no-commit hamburger/master

Most likely you'll get a bunch of conflicts at this point. To resolve them, since what you want is the contents of commit Y (from branch B), let's begin by removing everything in the merge mess:

git rm -rf .    # (note: this assumes you're at the top of your work tree)

Now we re-populate the work tree (and the index/staging-area) from commit Y, which is pointed-to by both the name B and the current branch for-merge and therefore by HEAD:

git checkout HEAD -- .  # (still assumes top of work tree)

At this point everything is resolved properly (you can check with git status) so you can just go ahead and git commit. The result is a merge commit tying everything together, on your new branch:

o - o - o               <-- A
       \
        o - o - Y       <-- B
                 \
                   ----- M   <-- for-merge
                       /
        o - o - o     /
      /           \  /
o - o - o - o - o - X   <-- hamburger/master
      \
        o - o - o       <-- hamburger/topic

You can now check out any of the various commits and inspect them to make sure you like the result. If you do like the result, rename the for-merge branch to whatever name you prefer (e.g., master) and you are ready to go. (You may need to rename the old master out of the way first, to do this. There are many other options, such as fast-forwarding master to the new merge commit, or using git reset --hard to move to it, but they all wind up doing mostly the same thing, except for how they leave their traces in reflogs.)

If you don't like the result, check out some other branch—any branch—and use git branch -D for-merge to delete the merge you just made. You'll be back to the two separate graphs in your one repository, ready to try something different. (This is why we made a for-merge branch.)

Doing it all the shortcut (easy) way

Instead of most of the above, once you've fetched the hamburger repo, you can make a merge commit with the desired tree and the correct pair of parent commits, and then set whatever branch label you want to the new commit, all in one command. Starting from whatever branch you want to have point to the merge commit (B, or more likely, master):

git merge --ff-only $(git commit-tree -p HEAD -p hamburger/master 'HEAD^{tree}')

The git commit-tree command writes a tree ID—in this case, 'HEAD^{tree}' into a new commit whose parents are given by the (ordered) -p argument. Here the two parents are the current commit, HEAD, and the commit identified by hamburger/master. By using the current commit's tree, we make the new commit's tree exactly match the current commit's (which, per your question, is what I think you want for these contents).

The output from git commit-tree is the new commit's hash, so we then move the current branch label in fast-forward fashion to the new commit.

Note that you should only do this if you really understand everything that's happening here, and you really want to use the exact same work-tree after the merge as before.

like image 101
torek Avatar answered Sep 22 '22 10:09

torek


What I believe you're saying is a complete repository replace, such that Repo B and all its history, etc., is reflected in repo A. A few ideas:

Idea 1: 1) Repo A: delete everything and commit 2) Repo B gets merged into Repo A 3) Repo A is committed and pushed

Idea 2: 1) Add a new remote on Repo B that points to the same remote as Repo a 2) Do a git push --force to update Repo A with absolutely the state of Repo B

Pretty sure that 1 works, though a little big of a hack, but think that B "should" work because the force should just ignore and disconnects between the state of Repo A and Repo B and just replace things.

like image 45
Scott Sosna Avatar answered Sep 18 '22 10:09

Scott Sosna