Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git merge internals

This is probably going to end up being a long question, so please bear with me.

I came across an incredible explanation for git merge decisions here: How does git merge work. I am trying to build on this explanation and see if there are any holes in depicting git merge this way. Essentially, the decision of whether or not a line shows up in the merged file can be depicted by a truth table:

W: original file, A: Alice's branch, B: Bob's branch

truth-table

Based on this truth table, it is straightforward to think up a line based algorithm to construct D: Construct D line-by-line by looking at corresponding lines from A and B and making a decision based on truth-table.

My first question is the case (0, 0, 1) which according to the link I posted above, seems to suggest that while that case is actually a conflict, git usually handles it by deleting the line anyway. Can this case actually ever lead to a conflict?

My second question is about deletion cases— (0, 1, 1) and (1, 0, 1). Intuitively, I feel the way these cases are handled might lead to a problem. Let’s say there a was function foo() in W. This function was never actually called in any piece of code. Let’s say in branch A, Alice finally decided to remove foo(). However, in branch B, Bob finally decided on a use for foo() and wrote another function bar() that called foo(). Just intuitively, based on the truth-table, it seems like the merged file will end up deleting the foo() function and adding bar() and Bob would be left wondering why foo() doesn’t work anymore! Which probably leads me to think that the truth-table model I derived for the 3 way merge, is probably not complete and missing something?

like image 519
Sandman Avatar asked May 23 '15 06:05

Sandman


Video Answer


1 Answers

My first question is the case (0, 0, 1)

Some version control systems like darcs consider that doing the same change (in your case, deletion) in two branches and merging them should lead to a conflict. The typical example is when you have twice

-#define NUMBER_OF_WHATEVER 42
+#define NUMBER_OF_WHATEVER 43

The merge algorithm cannot know for you whether you want the merge to yield 43 (because this is the value both versions agree on) or 44 (because 42 should be incremented twice).

However, considering this case as a conflict causes a lot of spurious conflicts. For example, if one cherry-picks a merge from the master branch to a maintainance branch and later merges the maintainance branch into master, then each line modified by the cherry-pick would lead to a conflict. And the conflict markers would be weird, because they would show the same content on both sides of the conflict marker, like

<<<<<<< HEAD
Hello world
=======
Hello world
>>>>>>> 77976da35a11db4580b80ae27e8d65caf5208086

So, most version-control systems, including Git, chose to consider no conflict when both sides of the merge introduce the same change.

My second question is about deletion cases— (0, 1, 1) and (1, 0, 1).

What you are describing is semantic conflicts. They do exist in theory, and you can even find corner-cases where the merge is compilable but has different semantics compared to the branches being merged. There's no magic, no textual merge algorithm can detect or resolve semantic conflicts. You have to live with them, or work alone.

In practice, they are rare enough. There are probably millions of people using a version control system daily and living with it. The majority probably never thought the problem could exist.

Still, a good organization considerably reduces the risk of semantic conflicts. If you check that your code still compiles after merges, you avoid ~90% of semantic conflicts, and if you have an automatic testsuite, then you'd have to find a semantic conflicts that creates a bug not covered by your testsuite for it to be problematic.

And actually, semantic conflicts are not specific to version-control systems. Another scenario not using merge is

  • I read the code and see a function f()
  • My coworker removes function f()
  • Working on the latest version, which doesn't have f() anymore, I still remember that there's a function f() and I try to use it.

In short, don't be afraid of semantic conflicts.

like image 152
Matthieu Moy Avatar answered Oct 20 '22 05:10

Matthieu Moy