Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where do I merge child branch to if it is based on a parent branch?

Tags:

git

Branch A is forked from master, and Branch B is forked from Branch A. Once B is made, A has made a few commits to itself. Master may have additional commits made from other branches, unrelated to A and B. The following diagram illustrates this scenario:

enter image description here

  1. When the parent Branch A is not yet merged to master, should Branch B be merged to Master or Branch A?
  2. When the parent Branch A is merged to master, should Branch B be merged to Branch A or Master?
  3. If A and B are both ready for pull requests, which branch should be merged first?
like image 308
Matt Avatar asked Apr 07 '18 22:04

Matt


2 Answers

TL;DR

Your question can't quite be answered as asked. The thing to realize is that Git does not care much about branch names; what it cares about is commits, specifically reachable commits. To understand this, you need to understand the commit graph (and your drawing suggests you are well on your way here).

The eventual answer, though, is that it probably does not matter anyway.

Long

The question is not answerable, in part because in Git, branches do not have parent/child relationships, and in part because pull requests are not merges.

You drew this (more nicely than I will, but this is meant to show the same thing as your graph):

commit --> commit --> commit --> commit --> commit --> master
              |
          branchA --> commit --> commit --> commit
                        |
                     branchB --> commit --> commit --> ...

This is a subtly incorrect drawing, and it will lead you astray. (Let me ask a mostly-rhetorical question: Why did you draw the name master on the right and the names of two branches towards the left?)

Let's draw this again, using single uppercase letters as stand-ins for actual commit hash IDs, in a way that's more accurate:

A  <-B  <-C  <-D  <-E  <-F   <--master
      \
       G  <-H  <-I  <-J   <--branchA
             \
              K  <-L  <-M  <--N  <--O   <--branchB

There are two primary differences here: the first and most obvious, but actually least important in a way, is that I made all the arrows backwards. This is because Git actually works backwards.

Commits do have parent/child relationships, but parent commits are entirely unaware of their children. (This is because a commit, once made, is frozen forever. Its children, if it will have any, do not yet exist. The children, once made, are made when the parent(s) do exist, so they freeze into themselves the IDs of their parents.)

The key difference in my drawing, though, is that the branch names lie towards the right of the graph, where new commits can be added. Each name points to the last commit on the branch. Git calls this last-commit-on-the-branch the tip commit. The process of adding a new commit begins with checking out a branch by checking out its tip commit. This tip commit becomes the current commit. You then do the usual work, run git add, and run git commit. The git commit part of this makes a new commit whose parent is the current commit.

It is at this point that the magical bit happens: Git changes the current branch name so that it points to the new commit we just made.

Let's draw the above a bit more compactly, and add a new branch name branchC, pointing to the same commit as master. We will also git checkout branchC so that we attach our HEAD to it: that way Git knows which branch name to change.

A--B--C--D--E--F   <-- master, branchC (HEAD)
    \
     G--H--I--J   <-- branchA
         \
          K--L--M--N--O   <-- branchB

We may make our new commit P, with parent F, and Git will now change branchC since it is the name to which HEAD is attached:

                 P   <-- branchC (HEAD)
                /
A--B--C--D--E--F   <-- master
    \
     G--H--I--J   <-- branchA
         \
          K--L--M--N--O   <-- branchB

In other words, the commit that was the tip of branchC a moment ago is now the parent of the commit that is the tip of branchC now. The branch name moved.

No existing commits changed! Nothing about the graph changed except for the addition of the new commit; but the branch names move. As far as Git itself is concerned, the branch names themselves are largely irrelevant.1 What matters to Git are the commits. Normal, everyday Git work is about adding new commits. But commit hash IDs are big, ugly, and impossible to remember, so Git gives us names—such as branch names—to remember specific commits.

The specific commits we want branch names to remember are the tip commits of our branches. This leaves an interesting question: What exactly do we mean by "branch"? In this case, we probably mean two different things at the same time:

  • a branch name, and
  • an ill-defined, but vaguely reasonable, subset of the commits reachable from the tip commit by working backwards.

(As a side note, I will mention that all names are moveable. It's possible to uproot a tag name as well, for instance. The key difference between a branch name and a tag name is that a tag name is not supposed to move: it should identify one specific commit when it is made, and keep identifying that same commit forever.)


1Branch names also play an important role in keeping commits reachable. A commit that exists in the graph, but has no external name by which it can be reached, will eventually be garbage collected and deleted. We need not worry about that here, though.


Defining "a branch" via reachability

We just saw that branch names identify only one particular commit—the tip of a branch. This is particularly interesting (or annoying or confusing, perhaps) when we have two different branch names that identify the same commit, as was the case here:

A--B--C--D--E--F   <-- master, branchC (HEAD)

Which branch are these six commits on? Git's first answer is: they are all on both branches. The tip of master is commit F, and the tip of branchC is also F. From F, we can walk back (via its parent link) to commit E, and then on to D and so on, all the way back to A.

Note that commits G and later are not reachable this way:

A--B--C--D--E--F   <-- master, branchC (HEAD)
    \
     G--H--I--J   <-- branchA

as we must always move backwards, from child to parent, when we do this kind of graph walk. But we can see that commits J back through G are on branchA, and—here's the tricky part—so are B and A.

In fact, A, which is our first commit ever and is therefore what Git calls a root commit, is on every branch. (It's possible to make Git graphs that have more than one root commit, but there's no point here.)

Excluding some commits

When using Git, it's extremely common to want to ask about commits that are on—or maybe "contained within" is a better phrase—some branch, by virtue of being reachable from that branch name tip commit, but are not on/contained-within some other name. For instance, commits A and B are on master and branchA, but we might want to look at those commits that are on branchA excluding any that are also on master. This will get us the set J back through G, which is just what we want here.

The short form of this is something you will have seen elsewhere:

git log master..branchA

The slightly longer form actually makes more sense:

git log branchA ^master

which means commits reachable from branchA, excluding (^) commits reachable from master.

The merge base

Now that we have a proper grasp on what it means to use a branch name to identify a tip commit and how to draw branches, we're ready to look at how Git implements merges. The merge operation starts with us having checked out some commit—usually some branch name, so that HEAD is attached to the branch name:

A--B--C--D--E--F   <-- master (HEAD)
    \
     G--H--I--J   <-- branchA
         \
          K--L--M--N--O   <-- branchB

We then run git merge with an argument that is typically another branch name. It does not have to be a branch name—anything that identifies any commit suffices! We can use the raw hash ID of commit J instead of using the word branchA. Git will, at this point, locate commit J in the commit graph. Or, we can git merge branchB to select commit O as the other commit. We can even git merge <hash-of-K> to select commit K as the other commit, regardless of the fact that it's not the tip of any branch.

Now the first bit of magic happens. Git uses the graph to find a merge base commit. The merge base is some commit that is on both branches—but it's not just any such commit, it's the best commit, for some definition of best. Technically this is the Lowest Common Ancestor, but in the graph we have above, there's an obvious best one:

  • For master and branchA, the merge base is commit B: that's the one that involves the least backwards travel from both commits F and J, that is on both branches.
  • For master and branchB, the merge base is still commit B.
  • For master and K, the merge base is still commit B.

In fact, for any commit not along the top row, the merge base will be B. (For a commit along the top row, such as commit D, the merge base will be that commit—but such a commit is already merged and Git will say that there is nothing to do.)

Merging

Once Git has located the merge base commit, the rest is easy! Well, maybe not that easy. :-) But the process is now well defined. Call the current commit L, for Left or Local or --ours. Call the other commit R, for Right or Remote or --theirs. Call the merge base B for base. Git now does the equivalent of:

git diff --find-renames B L > /tmp/ours
git diff --find-renames B R > /tmp/theirs

to find out what we changed since the merge base commit, and what they changed since that same merge base. In our case, if we merge branchA, the merge base really is B, the L(eft|ocal) commit is actually F, and the R(ight|emote) commit is J. Git finds the various changes, combines those changes, applies the result to the merge base's contents, and—if all goes well—commits the resulting snapshot.

This new commit has two parents. The first parent is the tip of the existing branch, as usual. The second parent is simply the other commit. So in our case, we get this:

A--B--C--D--E--F--P   <-- master (HEAD)
    \            /
     G--H--I----J   <-- branchA
         \
          K--L--M--N--O   <-- branchB

where P is our new merge commit.

The existence of a merge commit changes the graph

Suppose we do the above, then run git merge branchB. We already know that merge starts by finding the merge base commit. But now the merge base is no longer commit B!

The instructions for finding the merge base say that we should walk through all reachable commits, starting from each branch tip and working backwards as usual. But the tip of master is now P, which has two parents. The first parent takes us to F and the second parent takes us to J. Working backwards from both F and J, we get to E and I, and then D and H.

Meanwhile, starting from O—the tip of branchB—we walk back to N, then M, then L, then K, then H. So if we merge now, after making merge commit P, our merge base is commit H.

Note that if we don't merge branchA first and instead merge branchB first, we won't have commit P yet, and the merge base will be commit B. Assuming all goes well, the result would instead be this:

A--B--C--D--E--F----------P   <-- master (HEAD)
    \                    /
     G--H--I--J   <-----/----- branchA
         \             /
          K--L--M--N--O   <-- branchB

Pull requests are not merges

A pull request is when you ask someone to take some of your commits and do their own merge.

They—whoever "they" are—will do their own git merge, or perhaps do something different that causes them to copy your commits to new and different commits. They will create merge commit P, or not.

Once they have done this, if they have done it using git merge so that they have an actual merge that uses your original commits, it won't matter which pull request they do first and which they will do second. They will have an actual merge that actually incorporates your original commits, with those same hash IDs, so that their Git will see in their commit graph, the same history of commits you would see if you had done the merge yourself. Their Git will then find the correct new merge base and automatically just do the right thing. So it won't matter what order they use.

But if they don't use a merge at all, and/or copy your commits first before merging—the GitHub web interface calls these operations rebase and merge which doesn't merge at all, or squash and merge which also doesn't merge at all—then you and they will be left with a minor mess: they will have some other commit or commits that incorporate the work you did in the commits that you had on both your branchA and branchB. Their Git won't necessarily be able to deal with this.

If that's the case, they will probably ask you to rebase one of your pull requests anyway. It won't matter too much which one they did first: you will still have to untangle the mess. The amount of mess will be about the same either way.

like image 122
torek Avatar answered Oct 09 '22 02:10

torek


Long story short: It depends on your flow.

B has all of A's commits before the checkout, but, since A has commits that are not in B (commited after checkout), they are different branches and should be treated as that.

Merging B to Master will add to Master all of B's commits, and the ones from A from before the checkout.

Merging A to Master will add all of A's commits, and when merging B to Master, the only commits to be added is the difference between the two branches (the commits in B that are not in A).

Depending in what are you doing, and what are the changes of each branch and what actually makes sense, which branch you merge where, just have the above in mind.

Good luck!

like image 3
Mauricio Machado Avatar answered Oct 09 '22 02:10

Mauricio Machado