Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge made by 'recursive' strategy

Tags:

git

merge

I understood that git merge recursive actually happens when there is more than 1 common ancestor, and it will create a virtual commit to merge these common ancestors before proceeding to merge the more recent commits (sorry i am not sure whether there should be a term for this).

But I have been trying to find more information on how git merge recursive strategy actually works in detail but not much info can be found.

Can anyone explain in details how git merge recursive really perform, with examples and possibly flow maps to help visualizing better?

like image 632
tnkh Avatar asked May 06 '19 03:05

tnkh


People also ask

What is merge made by recursive strategy?

Recursive is the default merge strategy when pulling or merging one branch. Additionally this can detect and handle merges involving renames, but currently cannot make use of detected copies. This is the default merge strategy when pulling or merging one branch.

How do you undo a merge by a recursive strategy?

You can use the Git reset command to undo a merge. Firstly, you need to check for the commit hash (or id) so you can use it to go back to the previous commit. To check for the hash, run git log or git reflog . git reflog is a better option because things are more readable with it.

What is a merging strategy?

A merger is a corporate strategy to combine with another company and operate as a single legal entity. The companies agreeing to mergers are typically equal in terms of size and scale of operations.


1 Answers

You can find a description here (see also part 2):

When is merge recursive needed?

(Git 2.30, Q1 2020, will have a new merge-ort strategy)

What if we find "two common ancestors"? The branch explorer view below shows an alternative in which there are two possible "common ancestors".

initial situation

Please note: the example is a little bit forced since there's not a good reason – initially – for the developer merging from changeset 11 into 16 instead of merging from changeset 15 (the latest from the branch main at the point of the merge).
But let's assume it has to be done for a reason, let's say, changeset 11 was stable and 13 and 15 weren't at the time, for instance.

The point is: between 15 and 16 there's not a single unique ancestor, but rather, two ancestors at the same "distance": 12 and 11.

While this won't happen frequently, it is really likely to happen with long lived branches or complex branch topologies. (The case depicted above is the shortest one driving to the "multiple ancestor" problem, but it can happen too with several changesets and branches in between the "crossed" merges).

One solution is to "select" one of the ancestors as the valid one for the merge (which is the option Mercurial takes) but it has many drawbacks.

How merge recursive works?

When more than one valid ancestor is found, the recursive-merge strategy will create a new unique "virtual ancestor" merging the ones initially found.

The following image depicts the algorithm:

algo

A new ancestor 2 will be used as "ancestor" to merge the "src" and "dst".

merge

The "merge recursive strategy" is able to find a better solution than just "selecting one of the two" as I'll describe below.


Note: the merge recursive strategy was initially the merge "fredrik" strategy (see commit e4cf17c, Sept. 2005, Git v0.99.7a), after Fredrik Kuivinen.
It was a python script, initiated in commit 720d150, and it illustrates the original algorithm.

For more details, consider "Current Concepts in Version Control Systems from Petr Baudiˇs 2009-09-11", page 17.

|B| = 1 : b(B) = B0
|B| = 2 : b(B) = M(LCA(B0, B1), B0, B1)
M(B, x, y) = ∆−1
(b(B), x ∪ y)
m(x, y) = M(LCA(x, y), x, y)

(Yes, I don't know either how to read this)

In case of conflict, the main idea of the algorithm is to simply leave the conflict markers in place when using the result as a base for further merges.
This means that earlier conflicts are properly propagated as well as conflicting changes in newer revisions.

This refers to revctrl.org/CrissCrossMerge, which describes the contexte of a recursive merge in a criss-cross merge.

A criss-cross merge is an ancestry graph in which minimal common ancestors are not unique.
The simplest example with scalars is something like:

  a
 / \
b1  c1
|\ /|
| X |
|/ \|
b2  c2

The story one can tell here is that Bob and Claire made some change independently, then each merged the changes together.
They conflicted, and Bob (of course) decided his change was better, while Claire (typically) picked her version.
Now, we need to merge again. This should be a conflict.

Note that this can happen equally well with a textual merger -- they have each edited the same place in the file, and when resolving the conflict they each choose to make the resulting text identical to their original version (i.e., they don't munge the two edits together somehow, they just pick one to win).

So:

Another possible solution is to first merge 'b1' and 'c1' to a temporary node (basically, imagine that the 'X' in the diagram is actually a revision, not just edges crossing) and then use that as a base for merging 'b2' and 'c2'.

The interesting part is when merging 'b1' and 'c1' results in conflicts - the trick is that in that case, 'X' is included with the conflicts recorded inside (e.g. using the classical conflict markers).

Since both 'b2' and 'c2' had to resolve the same conflict, in the case they resolved it the same way they both remove the conflicts from 'X' in the same way and a clean merge results; if they resolved it in different ways, the conflicts from 'X' get propagated to the final merge result.

That is what torek described in "git merge: how did I get a conflict in BASE file?" as an "asymmetric result":

"These asymmetric results were harmless, except for the time bomb itself plus the fact that you later ran a recursive merge.
You get to see the conflict. It's up to you to resolve it — again — but this time there's no easy ours/theirs trick, if that worked for persons C and D."

Resuming from revctrl.org/CrissCrossMerge:

If a merge would result in more than two bases ('b1', 'c1, 'd1'), they are merged consecutively - first 'b1' with 'c1' and then the result with 'd1'.

This is what "Git"'s "recursive merge" strategy does.


With Git 2.29 (Q4 2020), in preparation for a new merge strategy backend, does provide a good description of conflicts and the role of a recursive merge strategy:

(Again, Git 2.30, Q1 2020, will have a new merge-ort strategy)

See commit 1f3c9ba, commit e8eb99d, commit 2a7c16c, commit 1cb5887, commit 6c74948, commit a1d8b01, commit a0601b2, commit 3df4e3b, commit 3b6eb15, commit bc29dff, commit 919df31 (10 Aug 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 36d225c, 19 Aug 2020)

t6425: be more flexible with rename/delete conflict messages

Signed-off-by: Elijah Newren

First, there's a basic conflict type known as modify/delete, which is a content conflict.
It occurs when one side deletes a file, but the other modifies it.

There is also a path conflict known as a rename/delete.
This occurs when one side deletes a path, and the other renames it.
This is not a content conflict, it is a path conflict.
It will often occur in combination with a content conflict, though, namely a modify/delete.
As such, these two were often combined.

Another type of conflict that can exist is a directory/file conflict. For example, one side adds a new file at some path, and the other side of history adds a directory at the same path.
The path that was "added" could have been put there by a rename, though.
Thus, we have the possibility of a single path being affected by a modify/delete, a rename/delete, and a directory/file conflict.

In part, this was a natural by-product of merge-recursive's design.
Since it was doing a four way merge with the contents of the working tree being the fourth factor it had to consider, it had working tree handling spread all over the code.
It also had directory/file conflict handling spread everywhere through all the other types of conflicts.

A natural outgrowth of this kind of structure is conflict messages that combine all the different types that the current codepath is considering.

However, if we want to make the different conflict types orthogonal and avoid repeating ourselves and getting very brittle code, then we need to split the messages from these different conflict types apart.
Besides, trying to determine all possible permutations is a royal mess.
The code to handle the rename/delete/directory/file conflict output is already somewhat hard to parse, and is somewhat brittle.
But if we really wanted to go that route, then we'd have to have special handling for the following types of combinations:

  • rename/add/delete: on side of history that didn't rename the given file, remove the file instead and place an unrelated file in the way of the rename
  • rename/rename(2to1)/mode conflict/delete/delete: two different files, one executable and the other not, are renamed to the same location, each side deletes the source file that the other side renames
  • rename/rename(1to2)/add/add: file renamed differently on each side of history, with each side placing an unrelated file in the way of the other
  • rename/rename(1to2)/content conflict/file location/(D/F)/(D/F)/: both sides modify a file in conflicting way, both rename that file but to different paths, one side renames the directory which the other side had renamed that file into causing it to possibly need a transitive rename, and each side puts a directory in the way of the other's path.

Let's back away from this path of insanity, and allow the different types of conflicts to be handled by separate pieces of non-repeated code by allowing the conflict messages to be split into their separate types. (If multiple conflict types affect a single path, the conflict messages can be printed sequentially.) Start this path with a simple change: modify this test to be more flexible and accept the output either merge backend (recursive or the new ort) will produce.


Note that Git 2.22 (Q2 2019) will improve that recursive merge strategy, since git merge-recursive" backend recently (Git 2.18) learned a new heuristics to infer file movement based on how other files in the same directory moved.

As this is inherently less robust heuristics than the one based on the content similarity of the file itself (rather than based on what its neighbours are doing), it sometimes gives an outcome unexpected by the end users. This has been toned down to leave the renamed paths in higher/conflicted stages in the index so that the user can examine and confirm the result.

See commit 8c8e5bd, commit e62d112, commit 6d169fd, commit e0612a1, commit 8daec1d, commit e2d563d, commit c336ab8, commit 3f9c92e, commit e9cd1b5, commit 967d6be, commit 043622b, commit 93a02c5, commit e3de888, commit 259ccb6, commit 5ec1e72 (05 Apr 2019) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 96379f0, 08 May 2019)

merge-recursive: switch directory rename detection default

When all of x/a, x/b, and x/c have moved to z/a, z/b, and z/c on one branch, there is a question about whether x/d added on a different branch should remain at x/d or appear at z/d when the two branches are merged.
There are different possible viewpoints here:

A) The file was placed at x/d; it's unrelated to the other files in x/ so it doesn't matter that all the files from x/ moved to z/ on one branch; x/d should still remain at x/d.

B) x/d is related to the other files in x/, and x/ was renamed to z/; therefore x/d should be moved to z/d.

Since there was no ability to detect directory renames prior to Git 2.18, users experienced (A) regardless of context.
Choice (B) was implemented in Git 2.18, with no option to go back to (A), and has been in use since.
However, one user reported that the merge results did not match their expectations, making the change of default problematic, especially since there was no notice printed when directory rename detection moved files.

Note that there is also a third possibility here:

C) There are different answers depending on the context and content that cannot be determined by Git, so this is a conflict.
Use a higher stage in the index to record the conflict and notify the user of the potential issue instead of silently selecting a resolution for them.

Add an option for users to specify their preference for whether to use directory rename detection, and default to (C).
Even when directory rename detection is on, add notice messages about files moved into new directories.


With Git 2.31 (Q1 2021), the "ORT" merge strategy (that I presented here) impacts the legacy recursive strategy.

See commit c5a6f65, commit e2e9dc0, commit 04af187, commit 43c1dcc, commit 1c7873c, commit 101bc5b, commit 6784574 (03 Dec 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 85cf82f, 06 Jan 2021)

merge-ort: add modify/delete handling and delayed output processing

Signed-off-by: Elijah Newren

The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map.
It might seem like a big change, but it really just is:

  • declaration of necessary map with some comments
  • initialization and recording of data
  • a bunch of code to iterate over the map at print/free time
  • at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling).

At this stage, it is probably not clear why I am opting for delayed output processing.
There are multiple reasons:

  1. Merges are supposed to abort if they would overwrite dirty changes in the working tree.
    We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized.
    Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need tosave these messages and only print them when relevant.

  2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type.
    This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10, Git 2.29)

  3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge.
    Rebases have typically done this.

  4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely.
    For example, a --remerge-diff option to git show or git log -p that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form.

like image 128
VonC Avatar answered Oct 17 '22 09:10

VonC