Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly does git's "rebase --preserve-merges" do (and why?)

Tags:

git

git-rebase

Git's documentation for the rebase command is quite brief:

--preserve-merges     Instead of ignoring merges, try to recreate them.  This uses the --interactive machinery internally, but combining it with the --interactive option explicitly is generally not a good idea unless you know what you are doing (see BUGS below). 

So what actually happens when you use --preserve-merges? How does it differ from the default behavior (without that flag)? What does it mean to "recreate" a merge, etc.

like image 868
Chris Avatar asked Apr 10 '13 01:04

Chris


People also ask

Does rebase preserves the history of merge?

The rebase operation itself combines resets, labels, merges to preserve the same structure. The tool itself rewinds before each merge tree, picks commits, then creates a merge commit. The very same steps can be done manually to achieve the same result.

What is the purpose of rebasing in git?

From a content perspective, rebasing is changing the base of your branch from one commit to another making it appear as if you'd created your branch from a different commit. Internally, Git accomplishes this by creating new commits and applying them to the specified base.

What does GitHub rebase and merge do?

When you select the Rebase and merge option on a pull request on GitHub.com, all commits from the topic branch (or head branch) are added onto the base branch individually without a merge commit. In that way, the rebase and merge behavior resembles a fast-forward merge by maintaining a linear project history.

When to use rebase and merge?

Reading the official Git manual it states that rebase “reapplies commits on top of another base branch” , whereas merge “joins two or more development histories together” . In other words, the key difference between merge and rebase is that while merge preserves history as it happened, rebase rewrites it .


1 Answers

As with a normal git rebase, git with --preserve-merges first identifies a list of commits made in one part of the commit graph, and then replays those commits on top of another part. The differences with --preserve-merges concern which commits are selected for replay and how that replaying works for merge commits.

To be more explicit about the main differences between normal and merge-preserving rebase:

  • Merge-preserving rebase is willing to replay (some) merge commits, whereas normal rebase completely ignores merge commits.
  • Because it's willing to replay merge commits, merge-preserving rebase has to define what it means to replay a merge commit, and deal with some extra wrinkles
    • The most interesting part, conceptually, is perhaps in picking what the new commit's merge parents should be.
    • Replaying merge commits also require explicitly checking out particular commits (git checkout <desired first parent>), whereas normal rebase doesn't have to worry about that.
  • Merge-preserving rebase considers a shallower set of commits for replay:
    • In particular, it will only consider replaying commits made since the most recent merge base(s) -- i.e. the most recent time the two branches diverged --, whereas normal rebase might replay commits going back to the first time the two branches diverged.
    • To be provisional and unclear, I believe this is ultimately a means to screen out replaying "old commits" that have already been "incorporated into" a merge commit.

First I will try to describe "sufficiently exactly" what rebase --preserve-merges does, and then there will be some examples. One can of course start with the examples, if that seems more useful.

The Algorithm in "Brief"

If you want to really get into the weeds, download the git source and explore the file git-rebase--interactive.sh. (Rebase is not part of Git's C core, but rather is written in bash. And, behind the scenes, it shares code with "interactive rebase".)

But here I will sketch what I think is the essence of it. In order to reduce the number of things to think about, I have taken a few liberties. (e.g. I don't try to capture with 100% accuracy the precise order in which computations take place, and ignore some less central-seeming topics, e.g. what to do about commits that have already been cherry-picked between branches).

First, note that a non-merge-preserving rebase is rather simple. It's more or less:

Find all commits on B but not on A ("git log A..B") Reset B to A ("git reset --hard A")  Replay all those commits onto B one at a time in order. 

Rebase --preserve-merges is comparatively complicated. Here's as simple as I've been able to make it without losing things that seem pretty important:

Find the commits to replay:   First find the merge-base(s) of A and B (i.e. the most recent common ancestor(s))     This (these) merge base(s) will serve as a root/boundary for the rebase.     In particular, we'll take its (their) descendants and replay them on top of new parents   Now we can define C, the set of commits to replay. In particular, it's those commits:     1) reachable from B but not A (as in a normal rebase), and ALSO     2) descendants of the merge base(s)   If we ignore cherry-picks and other cleverness preserve-merges does, it's more or less:     git log A..B --not $(git merge-base --all A B) Replay the commits:   Create a branch B_new, on which to replay our commits.   Switch to B_new (i.e. "git checkout B_new")   Proceeding parents-before-children (--topo-order), replay each commit c in C on top of B_new:     If it's a non-merge commit, cherry-pick as usual (i.e. "git cherry-pick c")     Otherwise it's a merge commit, and we'll construct an "equivalent" merge commit c':       To create a merge commit, its parents must exist and we must know what they are.       So first, figure out which parents to use for c', by reference to the parents of c:         For each parent p_i in parents_of(c):           If p_i is one of the merge bases mentioned above:             # p_i is one of the "boundary commits" that we no longer want to use as parents             For the new commit's ith parent (p_i'), use the HEAD of B_new.           Else if p_i is one of the commits being rewritten (i.e. if p_i is in R):             # Note: Because we're moving parents-before-children, a rewritten version             # of p_i must already exist. So reuse it:             For the new commit's ith parent (p_i'), use the rewritten version of p_i.           Otherwise:             # p_i is one of the commits that's *not* slated for rewrite. So don't rewrite it             For the new commit's ith parent (p_i'), use p_i, i.e. the old commit's ith parent.       Second, actually create the new commit c':         Go to p_1'. (i.e. "git checkout p_1'", p_1' being the "first parent" we want for our new commit)         Merge in the other parent(s):           For a typical two-parent merge, it's just "git merge p_2'".           For an octopus merge, it's "git merge p_2' p_3' p_4' ...".         Switch (i.e. "git reset") B_new to the current commit (i.e. HEAD), if it's not already there   Change the label B to apply to this new branch, rather than the old one. (i.e. "git reset --hard B") 

Rebase with an --onto C argument should be very similar. Just instead of starting commit playback at the HEAD of B, you start commit playback at the HEAD of C instead. (And use C_new instead of B_new.)

Example 1

For example, take commit graph

  B---C <-- master  /                      A-------D------E----m----H <-- topic          \         /           F-------G 

m is a merge commit with parents E and G.

Suppose we rebased topic (H) on top of master (C) using a normal, non-merge-preserving rebase. (For example, checkout topic; rebase master.) In that case, git would select the following commits for replay:

  • pick D
  • pick E
  • pick F
  • pick G
  • pick H

and then update the commit graph like so:

  B---C <-- master  /     \                 A       D'---E'---F'---G'---H' <-- topic 

(D' is the replayed equivalent of D, etc..)

Note that merge commit m is not selected for replay.

If we instead did a --preserve-merges rebase of H on top of C. (For example, checkout topic; rebase --preserve-merges master.) In this new case, git would select the following commits for replay:

  • pick D
  • pick E
  • pick F (onto D' in the 'subtopic' branch)
  • pick G (onto F' in the 'subtopic' branch)
  • pick Merge branch 'subtopic' into topic
  • pick H

Now m was chosen for replay. Also note that merge parents E and G were picked for inclusion before merge commit m.

Here is the resulting commit graph:

 B---C <-- master /     \                 A      D'-----E'----m'----H' <-- topic         \          /           F'-------G' 

Again, D' is a cherry-picked (i.e. recreated) version of D. Same for E', etc.. Every commit not on master has been replayed. Both E and G (the merge parents of m) have been recreated as E' and G' to serve as the parents of m' (after rebase, the tree history still remains the same).

Example 2

Unlike with normal rebase, merge-preserving rebase can create multiple children of the upstream head.

For example, consider:

  B---C <-- master  /                      A-------D------E---m----H <-- topic  \                 |   ------- F-----G--/  

If we rebase H (topic) on top of C (master), then the commits chosen for rebase are:

  • pick D
  • pick E
  • pick F
  • pick G
  • pick m
  • pick H

And the result is like so:

  B---C  <-- master  /    | \                 A     |  D'----E'---m'----H' <-- topic        \            |          F'----G'---/ 

Example 3

In the above examples, both the merge commit and its two parents are replayed commits, rather than the original parents that the original merge commit have. However, in other rebases a replayed merge commit can end up with parents that were already in the commit graph before the merge.

For example, consider:

  B--C---D <-- master  /    \                 A---E--m------F <-- topic 

If we rebase topic onto master (preserving merges), then the commits to replay will be

  • pick merge commit m
  • pick F

The rewritten commit graph will look like so:

                     B--C--D <-- master                     /       \                                 A-----E---m'--F'; <-- topic 

Here replayed merge commit m' gets parents that pre-existed in the commit graph, namely D (the HEAD of master) and E (one of the parents of the original merge commit m).

Example 4

Merge-preserving rebase can get confused in certain "empty commit" cases. At least this is true only some older versions of git (e.g. 1.7.8.)

Take this commit graph:

                   A--------B-----C-----m2---D <-- master                     \        \         /                       E--- F--\--G----/                             \  \                              ---m1--H <--topic 

Note that both commit m1 and m2 should have incorporated all the changes from B and F.

If we try to do git rebase --preserve-merges of H (topic) onto D (master), then the following commits are chosen for replay:

  • pick m1
  • pick H

Note that the changes (B, F) united in m1 should already be incorporated into D. (Those changes should already be incorporated into m2, because m2 merges together the children of B and F.) Therefore, conceptually, replaying m1 on top of D should probably either be a no-op or create an empty commit (i.e. one where the diff between successive revisions is empty).

Instead, however, git may reject the attempt to replay m1 on top of D. You can get an error like so:

error: Commit 90caf85 is a merge but no -m option was given. fatal: cherry-pick failed 

It looks like one forgot to pass a flag to git, but the underlying problem is that git dislikes creating empty commits.

like image 50
15 revs, 12 users 87% Avatar answered Sep 22 '22 14:09

15 revs, 12 users 87%