Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this GIT merge not result in conflicts?

Tags:

git

We discovered a serious issue with GIT at work today and I would like to know whether this is a bug or by design, and also, how to resolve this issue.

Consider the following sequence of events:

  1. Create Branch "test1" on Master
  2. Switch to Master:
    1. Edit a file and commit the change as commit "X"
  3. Switch to "test1":
    1. Cherry-pick commit "X" from Master
    2. Revert previous commit
    3. Merge Master into "test1"

Result: No merge conflict is reported even though the file has been edited in both branches, and even worse, the revert in step 3.2 did not survive, even though it was the most recent commit.

This is a huge problem, as can be seen by the following recent example: A colleague of mine had commited similar changes to different branches, noticed that parts of these changes were malicious and manually reverted parts of them on one of the branches because of that. After merging the branches, he was surprised to find out that his revert did not make it through the merge.

I uploaded a minimal example to google drive that demonstrates this issue. You can merge Master into test1 or vice versa, to see for yourself.

https://drive.google.com/drive/folders/19a-QPwOQKsn9PywUPd2DRnvUOml03nZ-?usp=sharing

If that's of any concern, I use TortoiseGIT 2.12.0.0 with Git for Windows 2.32.0.2.

like image 574
STiFU Avatar asked Jul 28 '21 09:07

STiFU


2 Answers

You got that result because that is the correct result.

Well, let's modify that statement: that is the correct results by the rules of Git merge. (It's also correct by the rules of most other merge programs, as far as I know, but there are algorithms that would at least flag this for attention. Git does not use such an algorithm.)

If Git's merge result is not the result you wanted, you have remedies: see below.

When Git does a merge, Git pays attention to three snapshots:

  • One snapshot is the current snapshot, i.e., the commit whose hash ID you will get if you run git rev-parse HEAD. If HEAD is attached to a branch name (as it normally is), that's the tip commit of the given branch.

  • One snapshot is the one you name on the command line: git merge foo looks up foo to get a commit hash ID.

  • The third, and in many ways the most important, snapshot is the merge base. (Git numbers this one "#1", with HEAD / --ours being #2 and the other / --theirs being #3, internally, even though we have to locate the other two inputs before we can locate this merge base input.) The commit hash ID of the merge base is located through the commit graph. In your case, it's the commit just before the commit you are calling commit X.

Let's draw these commits like this, with newer commits towards the right and single uppercase letters standing in for each actual commit hash ID:

          X   <-- master
         /
...--G--H   <-- here's where both branches start diverging
         \
          X'-X"  <-- test1

Here, commit X has the changes you made on master; commit X' has the same changes, and X" undoes those changes, so that the snapshot in X" exactly matches the snapshot in H.

Git's merge algorithm consists of doing the following, given that you're on master (so that commit X is the current / HEAD commit) and are merging test1 (commit X"):

  • Extract commit H as "stage 1".

  • Extract commit X as "stage 2".

  • Extract commit X" as "stage 3". (Note that we have just established that the contents of X" match those of H.)

  • For each file in the index / stage, that exists in all three stage slots:

    • Compare the copy in slot 1 vs the copies in slots 2 and 3.
    • For any file identical in all 3 slots, the result is any version of the file (all three match).
    • For any file identical in slot 1 and either slot 2 or slot 3, the result is the non-identical version: take the changed file.
    • For any file where all three slots differ, run a diff algorithm to find individual changes, and combine them. This step can have merge conflicts, but if not, the result is the correct merge.
  • For files that don't exist in all three slots (e.g., where renames or copies may have happened), things get more complicated. These can result in high level aka tree conflicts, which are treated as conflicts but don't show up as conflicted working-tree copies. But this case doesn't apply here so we get to ignore it.

The meat of git merge is done, but there are now cleanup steps:

  • Files that are correctly merged are dropped to slot 0 (and written out to the working tree if/as needed).

  • Files that have merge conflicts are left in all three slots; the working tree gets Git's best-effort at merging, including conflict markers.

  • In the case of conflicts, the merge now stops in the middle; the user must complete it. The working tree and index copies of each file exist for the user's use here.

  • Otherwise, unless told to stop without committing, git merge finishes the merge on its own, typically by creating a new merge commit.

If this merge result is not what you wanted, your remedies include, but are not limited to, the following:

  • Change the inputs (e.g., by adding more commits to one or both branches).

  • Use git merge -n so that git merge stops before committing the merge result. Use the index and working tree files—all staged for commit now, with just one version of each file in index slot #0—to produce the result you'd like instead. Then, commit the result. Note that this is known as an evil merge. There's nothing really wrong with it, but if you ever have Git repeat the merge—e.g., using the fancy new git rebase --rebase-merges code—Git won't know to make the new merge an evil merge, so it's wise to mark this clearly with the commit message or something.

  • Do the merge, let it be "wrong", commit it (and maybe mark it, especially for later skip-during-bisect), then add a commit to fix things.

like image 57
torek Avatar answered Oct 20 '22 17:10

torek


In the situation you describe : this is the expected result of the git merge operation -- and torek's answer gives a detailed description of why.

On the git side of things :

One issue with the workflow you mention is that git cherry-pick doesn't register a link to the original commit. You have to know that somehow, managing a cherry-pick is a bit like managing a "copy / paste" in code : if a bug is fixed on one side it should be manually ported to the other side.

The same goes for git rebase by the way : if you apply a rebase action which somehow keeps the original branch and also creates a new branch with some copied commits, that's also copy/pasting commits.

A more generic principle is :

"git merge succeeds without conflicts" doesn't mean "the resulting code is free of bugs".

You should always validate that the resulting code matches your expectations :

  • the resulting code should be reviewed,
  • you should validate your code with other steps, like compilation/build scripts, unit tests and QA,
  • in your example : if that issue is critical enough, you may for example add a linter (or any script that inspects the code) which checks that the incriminated code isn't present

This is especially true for your release branch ; on occasions, such as the one you describe (the developer merged towards a side branch), it's probably more about your team being aware of the potential traps -- and check the diff.

like image 27
LeGEC Avatar answered Oct 20 '22 15:10

LeGEC