How does git rebase select a starting commit from the source (often feature) branch?
I am guessing git goes back to the common ancestor of src and dst branch.
What if the two branches have no common commits?
One useful thing to know—you may already know this—is that rebase works by copying commits. It copies just the right commits, making the new copies go right after the end of the new base.
The selection of commits to rebase (to copy) actually uses one of the most crucial things to know about Git and its commit selection. When you understand this, you will also understand how git log
and git rev-list
work.1
First, remember that Git's commits form a graph (specifically a Directed Acyclic Graph or DAG, but you don't need to worry about that for a long time yet). Each commit remembers its parent, or for a merge commit, all of its parents. When there are no merge commits in the part of the graph we draw, we'll get a tree structure rather than an arbitrary DAG. Rebase works best when you don't have merges, since rebase normally throws away merges anyway.
We can—and you should—draw these graphs. You can get Git to do this for you, with git log --graph
for instance. It draws them vertically which takes too much room for our purposes here, so we'll draw them horizontally instead, with newer commits on the right and older ones on the left.
Here's an example graph:
...<- o <- o <- o <- o
\
o <- o <- o <- o
Each o
represents some commit in the graph. In formal graph theory each commit would be a vertex in the graph, but sometimes these are called nodes instead, and I tend to use the words "nodes" and "commit nodes" to describe them.
The "true name" of each commit node is a Git hash, one of those big ugly 40-character a234567...
things. Given a Git hash, Git can look up any object (including, of course, commits) in the repository. But somehow we have to remember these "true names", which are entirely not-memorable.
Since each commit remembers its parent, though, we can start from any commit and work backwards in history (but not forwards!). What we need is to remember the most recent or tip-most commit of a branch. We get Git to do this for us, by having Git save the big ugly hash in a branch name, like master
or develop
.
You can use git rev-parse
to turn such a name into a hash:
$ git rev-parse master
08bb3500a2a718c3c78b0547c68601cafa7a8fd9
This means that master
points to the commit whose real name is 08bb350...
. That commit has, inside it, the real name of a previous commit, and so on.
Let's draw that example graph again but add the branch names this time. I'll make it more compact, too: we know commits always point "backwards" (to their parents) so there's no need to draw those in as arrows, we can just use connecting lines. And I'm going to mark two of the commits with *
this time:
...--*--*--o--o <-- master
\
o--o--o--o <-- develop
Note that the name master
selects, specifically, the very tip of the master
branch. Likewise, the name develop
selects just the tip of the develop
branch. But Git often doesn't just select one commit. Often, when we tell Git to look at one commit in particular, we're really asking Git to consider that commit and all of its parents.
When we start from master
and work backwards, we get the two commits that are exclusively on master
(the tip, and the one before the tip) and then we get the second *
commit, and the first *
, and so on.
When we start from develop
and work backwards, we get the four commits that are exclusively on develop
, and then the second *
commit, and then the first *
, and so on.
That is, the two *
commits are, in fact, on both branches.
Note that we can draw the graph like this just as easily:
o--o <-- master
/
...--*--*--o--o--o--o <-- develop
or like this:
o--o <-- master
/
...--*--*
\
o--o--o--o <-- develop
All these drawings represent the same graph, and there's nothing particularly special about master
.
rebase
must solveIf we want to rebase develop
on master
, git rebase
must somehow pick the four commits that are only on develop
, while excluding all the commits that are also on master
.
This is where Git's X..Y
syntax comes in. Oddly, rebase doesn't use it! There's a reason for that, but let's just look at the syntax for the moment. With this syntax—in this case, with master..develop
—we ask Git to start from a tip commit, the tip of develop
, and select every commit going back in time, all the way to the beginning, that it can from there; but also start from the tip of master
, and un-select every commit going back in time.
I like to think of this as temporarily painting commits green (go) and red (stop). We can do the green paint first, painting the four o
s on develop
plus the two *
s plus whatever comes before them, then put red paint on top starting from the two o
s on master and continuing on to the two *
s and everything that comes before those. Or, we can do the red paint first, and then do the green paint but stop painting as soon as we find a red node. Either way we'll wind up with just the four exclusive-to-develop
commits "painted green".
This is how git rebase
knows to copy those four commits, and not any other commits.
The place that rebase
starts from is normally your current branch:
$ git branch
diff-merge-base
master
precious
* stash-exp
(so in this case I'm currently on stash-exp
).
The place that rebase
copies to—or rather, "copies after"—comes from the argument to git rebase
:
$ git rebase master
This, as it turns out, is also the place that git rebase
gets its idea of "red commits" (what not to copy).
Rebase effectively takes your argument, such as master
, and your current branch name—in my case, stash-exp
, but let's say develop
—and uses git rev-list
2 to get the IDs of the commits to copy:
$ git rev-list master..develop
(you have to run this before the rebase, of course).
When you run git rebase
, it tries to check to see if the other branch—the one you're rebasing onto—has copies of commits that you have. That is, suppose we look at the version graph we drew like this:
o--o
/
...--*--*
\
o--o--o--o
In this graph, there are two forks from final common *
commit. We could easily rebase either one onto the other. But what if one of the top-line o
commits matches, more or less, one of the bottom-line o
commits? It would be nice to omit the extra. Let's rebase the bottom line onto the top, but let's label these commits A
, B
, C
, and D
and note, on the top, that one of the o
s is Just Like B
:
o--B'
/
...--*--*
\
A--B--C--D
(this is the kind of graph you get when you use cherry-pick
, for instance). Commits B
and B'
are basically copies of each other. So when we rebase the lower four commits, we really should just copy A
, C
, and D
, giving:
o--B'
/ \
...--*--* A'-C'-D'
\
A--B--C--D
Last, let's put the labels back on. We want master
to point to B'
, and develop
to point to D'
, like this:
o--B' <-- master
/ \
...--*--* A'-C'-D' <-- develop
\
A--B--C--D [abandoned]
What happens to the original A--B--C--D
chain? We've labeled it "abandoned" here, but in fact, Git hangs on to it for a while, using both the reflog mechanism—e.g., we can ask Git to find develop@{1}
which finds the original commit D
—and also the special name ORIG_HEAD
, which rebase
sets up to point to D
. The reflog entry sticks around for 30 days by default,3 while the name ORIG_HEAD
sticks around until something (usually another rebase) overwrites it.
Sometimes, this bit of Git magic—use one name like master
to "paint commits red", and then use the same name to decide where to put the copies—is insufficient. For some cases, you need to tell git rebase
to stop copying at some specific point, but to put the new copies somewhere else. In this case, you can use git rebase --onto
:
git rebase --onto target upstream
(the rebase documentation calls the red-paint "stop" argument upstream
). The default is that upstream
is both the --onto
target and the stop-copy red-paint indicator, and it works when the point to stop is in the right place in the history of the point to copy-onto. That's usually true—and quite often (but not always), the thing to give as the upstream
for some branch foo
is an origin/foo
remote-tracking branch that you will set4 as the upstream for foo
, which I think is why rebase calls this argument upstream
.
What if the two branches have no common commits?
In this case, the "paint commit nodes red" step has no effect on the "paint commits green" step:
o--o--o--o <-- master
o--o--o <-- unrelated
If you're on branch unrelated
and you run git rebase master
, Git effectively paints the three unrelated
-branch commits green and the four master
-branch commits red, then takes the green-painted commits, which are the three commits reachable from unrelated
's tip commit. The rebase code then copies those commits:
o--o--o--o <-- master
\
o--o--o <-- unrelated
o--o--o [abandoned]
1Well, git rev-list
has about a million flags, so this is a bit of an overstatement, since it won't help you so much with all the flags. :-)
2There are a number of side complications here: sometimes git rebase
actually uses git rev-list
directly, and sometimes it doesn't. The effect is pretty much the same, though.
3This is configurable: gc.reflogExpire
and gc.reflogExpireUnreachable
control the defaults, and there are additional names you can set for specific patterns.
4You can set this explicitly with git branch --set-upstream-to
, but for these kinds of branches, it's generally set automatically when you use git checkout
to create the branch initially. Once it's set, git rebase
, with no extra arguments, will find it automatically as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With