Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does git not automatically reapply conflict resolutions when performing a rebase-merges rebase?

Tags:

(Similar to this question, but with some context and demonstration of why rerere is not an answer.)

For the given history:

                /...o      origin/master
o...o...o...o...o...o...o  master
    \...o........../       topic

I have a topic branch which I've merged into master, and made one additional commit. Meanwhile, someone upstream has made another commit on origin/master, so I can no longer push my master as-is.

I want to rebase my master onto origin/master without altering the commit SHA on topic and without losing the conflict resolution already performed on master. (This is by far my most common case of wanting to preserve merge commits, so I'm surprised that this is apparently so difficult.)

With rerere enabled, git rebase -p almost works -- for any conflicts in the original merge, it remembers what I did to fix them and reapplies this (although it leaves the file marked as conflicted, so I have to remember to mark each one as already resolved without restarting conflict resolution on the file, which is mildly annoying from the TortoiseGit front-end). But if there were any other changes to files that were also fixed in the merge commit (eg. lines purely added in the merge without conflicts, but still needed to be corrected due to changes elsewhere), these are lost.

Here's the thing though. In my (perhaps flawed) understanding of merge commits, they consist of two (or more) parents and a unique changeset (used to store the conflict resolutions, plus any other changes made before committing the merge or later amended to the merge commit). It appears that rebase -p re-creates the merge commit but completely discards this extra changeset.

Why doesn't it reapply the changeset from the original merge commit? That would make rerere redundant and avoid losing these additional changes. It could leave the affected files marked as conflicts if it wanted human confirmation, but in many cases this automatic resolution would be entirely sufficient.

To put it another way, to label some of the commits above:

                /...N      origin/master
o...o...o...o...B...M...A  master
    \...T........../       topic

T - the commit on topic
B - the merge-base of origin/master and master
N - the new commit on origin/master
M - the merge between B and T
A - the extra post-merge commit

M has parents B and T and a unique changeset Mc. When creating M', git performs a new merge between parents N and T, and discards Mc. Why can't git just reapply Mc instead of discarding it?

In the end, I want the history to look like this:

o...o...o...o...B...N...M'...A'  master
    \...T............../

Where M' and A' change SHA1 from the rebase, but M' includes the Mc changeset and T didn't change SHA1 or parent. And now I can fast-forward origin/master to A'.


I have also noticed that there's a new option --rebase-merges which sounded nice at first and does result in the right graph afterwards -- but just like --preserve-merges still stops with conflicts on M' and loses any unique changes in Mc not otherwise saved by rerere.


An alternate formulation of the question which might be more useful:

Given the initial state above, and having just started an interactive rebase that is now in either HEAD1 or HEAD2 states:

        /...........(T)
       /               \
      /             /...M'  HEAD2
     /              /...    HEAD1
    /           /...N       origin/master
o...o...o...o...B...M...A   master
    \...T........../        topic

(HEAD1 has checked out N but done nothing else yet; HEAD2 has created a new merge with N as parent 1 and T as parent 2 but hasn't committed yet due to unsolved conflicts)

Is there some sequence of rebase commands and/or git commands which will:

  1. Calculate the diff Mc between M and B (choosing B because the other parent T is not changing)
  2. Apply this to the conflicted tree M' (which should completely resolve all conflicts, unless N introduces new ones) OR Simply apply this on top of N (without first doing any merge) -- these should be equivalent; the second might be easier
  3. Pause for a human to resolve any remaining conflicts introduced by N, if any.
  4. Commit M' as a merge between N and T
  5. Continue as usual (in this case rebasing A to A' on top of M')

And why doesn't git do this by default?

like image 255
Miral Avatar asked Jan 25 '19 05:01

Miral


1 Answers

The fundamental reason that git rerere cannot record the non-conflicts is that git rerere is implemented in a cheap and dirty manner: Git takes each initial conflict, strips it of some data to make it more applicable (the same way that git patch-id strips line numbers and some white-space), and then saves the conflict as a blob object in the database, obtaining a hash ID that it stores in the rerere directory. Later, when you git commit the result, Git pairs that one specific conflicted-changes blob with its resolution. So it only "knows" the conflicts, not any other changes.

The later merge (with its conflicts) tries saving the conflicts again, gets a hash ID again, and finds the pairing, so it uses the saved second blob as the resolution. Since the non-conflicted changes aren't saved here, they never show up as part of this process.

Git could perhaps save more, but it doesn't.

In my (perhaps flawed) understanding of merge commits, they consist of two (or more) parents and a unique changeset (used to store the conflict resolutions, plus any other changes made before committing the merge or later amended to the merge commit).

This is incorrect. All commits are just snapshots of state. Merges are not special here—just like non-merge commits, they have a complete source tree. What is special about them is that they have two (or more) parents.

Copying a non-merge, as git cherry-pick does (and git rebase does repeatedly by repeatedly invoking git cherry-pick, or doing something not quite as good, but similar), works by using the commit's (one and only) parent as the merge base for the merge-as-a-verb operation. Copying a merge is not possible in general, and rebase doesn't try: it just re-performs the merge.

(On the other hand, git cherry-pick will let you cherry-pick a merge, using its -m option to select one particular parent. Git simply pretends that that is the lone parent for the duration of the three-way merge operation. In theory, the rebase code could do the same: -m 1 is almost always the correct parent, and one can always use the low-level git commit-tree to make the actual commit, so as to make it a merge commit. But git rebase does not do this.)

... if there were any other changes to files that were also fixed in the merge commit (eg. lines purely added in the merge without conflicts, but still needed to be corrected due to changes elsewhere), these are lost.

Yes (for the reason discussed above). That is perhaps one reason people refer to such things as an "evil merge" (though another possible reason for the phrase is that such changes were, at least to all evidence one has available in the future, not actually requested by anyone). While it does not help your goal with existing merges, I would advise not making such changes: instead, make those changes before or after the merge, in an ordinary non-merge commit that feeds into or out of the merge, so that a later rebase -p or rebase --rebase-merges can preserve them.

like image 114
torek Avatar answered Oct 13 '22 10:10

torek