Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Git select the rebase starting point?

Tags:

git

rebase

How does git rebase select a starting commit from the source (often feature) branch?

I am guessing git goes back to the common ancestor of src and dst branch.

What if the two branches have no common commits?

like image 929
Vortex Avatar asked Jan 06 '23 15:01

Vortex


1 Answers

One useful thing to know—you may already know this—is that rebase works by copying commits. It copies just the right commits, making the new copies go right after the end of the new base.

The selection of commits to rebase (to copy) actually uses one of the most crucial things to know about Git and its commit selection. When you understand this, you will also understand how git log and git rev-list work.1

First, remember that Git's commits form a graph (specifically a Directed Acyclic Graph or DAG, but you don't need to worry about that for a long time yet). Each commit remembers its parent, or for a merge commit, all of its parents. When there are no merge commits in the part of the graph we draw, we'll get a tree structure rather than an arbitrary DAG. Rebase works best when you don't have merges, since rebase normally throws away merges anyway.

We can—and you should—draw these graphs. You can get Git to do this for you, with git log --graph for instance. It draws them vertically which takes too much room for our purposes here, so we'll draw them horizontally instead, with newer commits on the right and older ones on the left.

Here's an example graph:

...<- o <- o <- o <- o
            \
             o <- o <- o <- o

Each o represents some commit in the graph. In formal graph theory each commit would be a vertex in the graph, but sometimes these are called nodes instead, and I tend to use the words "nodes" and "commit nodes" to describe them.

The "true name" of each commit node is a Git hash, one of those big ugly 40-character a234567... things. Given a Git hash, Git can look up any object (including, of course, commits) in the repository. But somehow we have to remember these "true names", which are entirely not-memorable.

Since each commit remembers its parent, though, we can start from any commit and work backwards in history (but not forwards!). What we need is to remember the most recent or tip-most commit of a branch. We get Git to do this for us, by having Git save the big ugly hash in a branch name, like master or develop.

You can use git rev-parse to turn such a name into a hash:

$ git rev-parse master
08bb3500a2a718c3c78b0547c68601cafa7a8fd9

This means that master points to the commit whose real name is 08bb350.... That commit has, inside it, the real name of a previous commit, and so on.

Let's draw that example graph again but add the branch names this time. I'll make it more compact, too: we know commits always point "backwards" (to their parents) so there's no need to draw those in as arrows, we can just use connecting lines. And I'm going to mark two of the commits with * this time:

...--*--*--o--o        <-- master
         \
          o--o--o--o   <-- develop

Note that the name master selects, specifically, the very tip of the master branch. Likewise, the name develop selects just the tip of the develop branch. But Git often doesn't just select one commit. Often, when we tell Git to look at one commit in particular, we're really asking Git to consider that commit and all of its parents.

When we start from master and work backwards, we get the two commits that are exclusively on master (the tip, and the one before the tip) and then we get the second * commit, and the first *, and so on.

When we start from develop and work backwards, we get the four commits that are exclusively on develop, and then the second * commit, and then the first *, and so on.

That is, the two * commits are, in fact, on both branches.

Note that we can draw the graph like this just as easily:

          o--o          <-- master
         /
...--*--*--o--o--o--o   <-- develop

or like this:

          o--o        <-- master
         /
...--*--*
         \
          o--o--o--o   <-- develop

All these drawings represent the same graph, and there's nothing particularly special about master.

This is the heart of the problem that rebase must solve

If we want to rebase develop on master, git rebase must somehow pick the four commits that are only on develop, while excluding all the commits that are also on master.

This is where Git's X..Y syntax comes in. Oddly, rebase doesn't use it! There's a reason for that, but let's just look at the syntax for the moment. With this syntax—in this case, with master..develop—we ask Git to start from a tip commit, the tip of develop, and select every commit going back in time, all the way to the beginning, that it can from there; but also start from the tip of master, and un-select every commit going back in time.

I like to think of this as temporarily painting commits green (go) and red (stop). We can do the green paint first, painting the four os on develop plus the two *s plus whatever comes before them, then put red paint on top starting from the two os on master and continuing on to the two *s and everything that comes before those. Or, we can do the red paint first, and then do the green paint but stop painting as soon as we find a red node. Either way we'll wind up with just the four exclusive-to-develop commits "painted green".

This is how git rebase knows to copy those four commits, and not any other commits.

The place that rebase starts from is normally your current branch:

$ git branch
  diff-merge-base
  master
  precious
* stash-exp

(so in this case I'm currently on stash-exp).

The place that rebase copies to—or rather, "copies after"—comes from the argument to git rebase:

$ git rebase master

This, as it turns out, is also the place that git rebase gets its idea of "red commits" (what not to copy).

Rebase effectively takes your argument, such as master, and your current branch name—in my case, stash-exp, but let's say develop—and uses git rev-list2 to get the IDs of the commits to copy:

$ git rev-list master..develop

(you have to run this before the rebase, of course).

Extra wrinkle #1

When you run git rebase, it tries to check to see if the other branch—the one you're rebasing onto—has copies of commits that you have. That is, suppose we look at the version graph we drew like this:

          o--o
         /
...--*--*
         \
          o--o--o--o

In this graph, there are two forks from final common * commit. We could easily rebase either one onto the other. But what if one of the top-line o commits matches, more or less, one of the bottom-line o commits? It would be nice to omit the extra. Let's rebase the bottom line onto the top, but let's label these commits A, B, C, and D and note, on the top, that one of the os is Just Like B:

          o--B'
         /
...--*--*
         \
          A--B--C--D

(this is the kind of graph you get when you use cherry-pick, for instance). Commits B and B' are basically copies of each other. So when we rebase the lower four commits, we really should just copy A, C, and D, giving:

          o--B'
         /    \
...--*--*      A'-C'-D'
         \
          A--B--C--D

Last, let's put the labels back on. We want master to point to B', and develop to point to D', like this:

          o--B'          <-- master
         /    \
...--*--*      A'-C'-D'  <-- develop
         \
          A--B--C--D     [abandoned]

What happens to the original A--B--C--D chain? We've labeled it "abandoned" here, but in fact, Git hangs on to it for a while, using both the reflog mechanism—e.g., we can ask Git to find develop@{1} which finds the original commit D—and also the special name ORIG_HEAD, which rebase sets up to point to D. The reflog entry sticks around for 30 days by default,3 while the name ORIG_HEAD sticks around until something (usually another rebase) overwrites it.

Extra wrinkle #2

Sometimes, this bit of Git magic—use one name like master to "paint commits red", and then use the same name to decide where to put the copies—is insufficient. For some cases, you need to tell git rebase to stop copying at some specific point, but to put the new copies somewhere else. In this case, you can use git rebase --onto:

git rebase --onto target upstream

(the rebase documentation calls the red-paint "stop" argument upstream). The default is that upstream is both the --onto target and the stop-copy red-paint indicator, and it works when the point to stop is in the right place in the history of the point to copy-onto. That's usually true—and quite often (but not always), the thing to give as the upstream for some branch foo is an origin/foo remote-tracking branch that you will set4 as the upstream for foo, which I think is why rebase calls this argument upstream.

What if the two branches have no common commits?

What if the two branches have no common commits?

In this case, the "paint commit nodes red" step has no effect on the "paint commits green" step:

o--o--o--o   <-- master

o--o--o      <-- unrelated

If you're on branch unrelated and you run git rebase master, Git effectively paints the three unrelated-branch commits green and the four master-branch commits red, then takes the green-painted commits, which are the three commits reachable from unrelated's tip commit. The rebase code then copies those commits:

o--o--o--o           <-- master
          \
           o--o--o   <-- unrelated

o--o--o              [abandoned]

1Well, git rev-list has about a million flags, so this is a bit of an overstatement, since it won't help you so much with all the flags. :-)

2There are a number of side complications here: sometimes git rebase actually uses git rev-list directly, and sometimes it doesn't. The effect is pretty much the same, though.

3This is configurable: gc.reflogExpire and gc.reflogExpireUnreachable control the defaults, and there are additional names you can set for specific patterns.

4You can set this explicitly with git branch --set-upstream-to, but for these kinds of branches, it's generally set automatically when you use git checkout to create the branch initially. Once it's set, git rebase, with no extra arguments, will find it automatically as well.

like image 51
torek Avatar answered Jan 17 '23 14:01

torek