Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recovering lost changes after reverting bad merge

I'm relatively new to Git and made a (stupid) mistake after misunderstanding a help article and I am not sure how to fully fix the issue using Git rather than manually reintroducing the changes to the target branch.

S---sc1---sc2---sc3-----sc4----M4---R1---M5---sc5
 \                         \  /         /
  T1-------------------M2---M3--------R2
   \               \  /
    F---fc1---fc2---M1

Some notes: S is the main branch in this scenario, T1 is the team branch pulled from S, and F is my feature branch pulled from T1.

I have auto-merges set up, so when commits are made to the T1 branch, they are run through continuous integration and then auto-merged up to S. There was one file in the T1 branch that was having merge conflicts with S from another team member's commit, so I decided to fix that once I was done with my work on F.

I merged T1 into F (M1), then F into T1 (M2). Given the issues I've had in the past with merge conflict resolutions not behaving how I expect, I thought I'd try something new: merging just the conflicting file from S into T1, solving the merge conflict there, removing all of the other files from the merge, and then allowing continuous integration to merge everything up to S

I started a merge from S to T1 (M3) without committing, addressed the conflict, removed the other (~200) files from the merge, then committed. This auto-merged to S (M4).

I noticed immediately that excluding those ~200 files looked to have wiped the changes out entirely, which equated to about a month's worth of work across 2 teams. I (incorrectly) decided the best course of action was to act swiftly and revert the merge commits M4 and M3 before my mistake got into anyone else's local repos. I first reverted M4 (R1) and once that was committed I reverted M3 (R2). I figured that was the right order because I wasn't sure if the other way around would have introduced issues when the auto-merge kicked in. Eventually R2 was committed and auto-merged to S (M5).

This solved the issue of everyone else's changes being wiped out, but all of my changes in F plus the file that originally had the merge conflict were gone from S. I was able to commit the single file's changes directly to S (sc5), but the changes in F are a lot more complex. They live in T1 still, but since they were reverted from S as part of R1, I cannot just commit them back.

I've spent the better part of the day trying to figure out how to best get these changes up to S, but git rebase and git cherry-pick don't seem like they will do what I need, though I am very aware that I can be wrong on that. If anyone much better at Git than I am could suggest at least a starting point, that would be amazing. Thanks!

Edit: Removed unhelpful/confusing points from graph. M2 did not auto-merge up to S because of the merge conflict I attempted to resolve with M3.

Edit 2: After reading through the fantastic explanation from torek, I began attempting a rebase. I had forgotten that I had merged the T1 branch into the F branch multiple times throughout the history of F because of how much time this feature branch spanned. This meant there were many, many merge conflicts to resolve.

At torek's response to this, I attempted a merge squash. My initial thought is I need to merge the new branch from the merge squash up to the T1 branch and then merge the T1 branch up to S, but I ran into the same issue where it doesn't see the changes. I assume this is because the changes already exist in T1 so it was basically just feeding the same, previously-reverted changes back into S, which doesn't want them.

Edit 3: Thanks to the very well-explained, detailed answer from torek (thank you so much!), I am going through with the merge squash and then merging the result of that up to the S branch after resolving the conflicts.

like image 385
Tim Avatar asked Mar 06 '23 05:03

Tim


1 Answers

This is quite long, so feel free to skip over sections you already know (or scroll all the way to the end). Each section has setup information to explain what's going on, or what we are doing, in later ones.

Introduction-y bits

Let me start by re-drawing this graph (which I think is sort of a partial graph, but it contains the key commits we need) the way I prefer:

S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5   <-- branch-S
  \                        \  /         /
   T0-------------o----M2---M3--------R2   <---- branch-T1
    \              \  /
    F0--fc1---fc2---M1   <------------------- branch-F

Here, the branch names are branch-S, branch-T1, and branch-F, and these names currently identify commits whose hash IDs are something unpronounceable and impossible for humans to remember, but we are calling sc5, R2, and M1 respectively. Any o nodes are commits that are not especially distinguished in any way, and may actually represent some arbitrary number of commits. The named fc<number>s are some set of commits on the feature branch, with the M<number> commits being merges. I renamed the first commits S0, T0, and F0 just to tell them apart from the branch names.

Some merges are made manually:

$ git checkout <branch-name>
$ git merge [options] <other-branch>
... fix up conflicts if necessary, and git commit (or git merge --continue)

Other merges are made by software and happen only if there are no conflicts. The R commits are from running:

git checkout <branch>
git revert -m 1 <hash ID of some M commit>

where <branch> was either T1 or S, and -m 1 is because you always have to tell git revert which parent to use when reverting a merge, and it's almost always parent #1.

Making commits moves a branch name

The simplest Git commit graph is a straight line, with one branch name, typically master:

A--B--C   <-- master (HEAD)

Here, we need to mention Git's index. The index is perhaps best described as the place where Git builds the next commit to make. It initially contains every file as saved in the current commit (here C): you check out this commit, populating the index and work-tree with the files from commit C. The name master points to this commit, and the name HEAD is attached to the name master.

You then modify files in the work-tree, use git add to copy them back into the index, use git add to copy new files into the index if needed, and run git commit. Making a new commit works by freezing these index copies into a snapshot. Git then adds the snapshot metadata—your name and email, your log message, and so on—along with the current commit's hash ID, so that the new commit points back to the existing commit. The result is:

A--B--C   <-- master (HEAD)
       \
        D

with the new commit, with its new unique hash ID, just hanging out in midair, with nothing to remember it. So, the last step of making a new commit is to write the new commit's hash ID into the branch name:

A--B--C--D   <-- master (HEAD)

and now the current commit is D, and the index and the current commit match. If you git add-ed all the files in the work-tree, that too matches the current commit and the index. If not, you can git add more files and commit again, making the name master point to new commit E, and so on. In any case, the (single) parent of the new commit is whatever the current commit was.

About merges

Let me outline how git merge actually works. It's very simple in some cases and some ways, and let's use the simplest true-merge case to start with. Consider a graph that looks like this:

          o--...--L   <-- mainline (HEAD)
         /
...--o--*
         \
          o--...--R   <-- feature

We have run git checkout mainline; git merge feature, so we are telling Git to merge branch feature / commit R into branch mainline / commit L. To do this, Git must first find the merge base commit. The merge base is, roughly speaking, the "nearest" commit common to—i.e., reachable from—both branches. In this simple case, we start at L and walk backwards to older commits, and start at R and walk backwards, and the first place we meet is commit *, so that's the merge base.

(For much more about reachability, see Think Like (a) Git.)

Having found the merge base, Git needs to turn both the L (left-side / local / --ours) and R (right-side / remote / --theirs) snapshots into change-sets. These change-sets tell Git what we did, on mainline, since the merge base *, and what they did, on feature, since the merge base. These three commits all have hash IDs, which are the real names of the three commits, so Git can internally run the equivalent of:

git diff --find-renames <hash-of-*> <hash-of-L>   # what we changed
git diff --find-renames <hash-of-*> <hash-of-R>   # what they changed

The merge simply combines the two sets of changes, and applies the combined set to the files in the snapshot in *.

When all goes well, Git makes the new commit in the usual way, except that the new commit has two parents. This makes the current branch to point to the new merge commit:

          o--...--L
         /         \
...--o--*           M   <-- mainline (HEAD)
         \         /
          o--...--R   <-- feature

The first parent of M is L, and the second is R. This is why reverts almost always use parent #1, and why git log --first-parent only "sees" the mainline branch, traversing from M up to L while ignoring the R branch entirely. (Note that the word branch here refers to the structure of the graph, rather than branch names like feature: at this point, we can delete the name feature entirely. See also What exactly do we mean by "branch"?)

When things go wrong

A merge will stop, with a merge conflict, if the two change-sets overlap in a "bad way". In particular, suppose that the base-vs-L says to change line 75 of file F, and the base-vs-R also says to change line 75 of file F. If both change-sets say to make the same change, Git is OK with this: the combination of the two changes is to make the change once. But if they say to make different changes, Git declares a merge conflict. In this case, Git will stop after doing whatever it can on its own, and make you clean up the mess.

Since there are three inputs, Git will, at this point, leave all three versions of file F in the index. Normally the index has one copy of each file to be committed, but during this conflict resolution phase, it has up to three copies. (The "up to" part is because you can have other kinds of conflicts, which I won't go into here for space reasons.) Meanwhile, in the work-tree copy of file F, Git leaves its approximation to the merge, with either two, or all three, sets of lines in the work-tree file with <<<<<<< / >>>>>>> markers around them. (To get all three, set merge.conflictStyle to diff3. I prefer this mode for resolving conflicts.)

As you have seen, you can resolve these conflicts any way you like. Git assumes that whatever you do is the right way to resolve the problem: that this produces the exactly-correct final merged files, or lack of files in some cases.

Whatever you do, though, the final merge—assuming you don't abort it, and are not using one of the non-merge-y variants of merge—still makes the same result in the graph, and whatever you put in the index, by resolving the conflicts, is the result of the merge. That's the new snapshot in the merge commit.

More-complex merge bases

When the graph is very simple like the one above, the merge base is easy to see. But graphs don't stay simple, and yours isn't. The merge base for a graph that has some merges in it is trickier. Consider, e.g., just the following fragment:

 ...--sc4----M4---R1
         \  /
...--M2---M3--------R2

If R1 and R2 are two tip commits, what is their merge base? The answer is M3, not sc4. The reason is that while M3 and sc4 are both commits that are reachable by starting at both R1 and R2 and working backwards, M3 is "closer" to R2 (one step back). The distance from R1 to either M3 or sc4 is two hops—go to M4, then go back one more step—but the distance from R2 to M3 is one hop and the distance from R2 to sc4 is two hops. So M3 is "lower" (in graph terms) and therefore wins the contest.

(Fortunately, your graph has no cases where there is a tie. If there is a tie, Git's default approach is to merge all the tied commits, two at a time, to produce a "virtual merge base", which is in fact an actual, albeit temporary, commit. It then uses this temporary commit made by merging the merge bases. This is the recursive strategy, which gets its name from the fact that Git recursively merges the merge bases to get a merge base. You can choose instead the resolve strategy, which simply picks one of the bases at seemingly-random—whichever base pops out at the front of the algorithm. There's rarely any advantage to that: the recursive method usually either does the same thing, or is an improvement over randomly selecting a winner.)

The key takeaway here is that making a merge commit changes which commit future merges will choose as their merge base. This is important even when making simple merges, which is why I put it in boldface. It's why we make merge commits, as opposed to squash-"merge" operations that aren't merges. (But squash merges are still useful, as we will see in a bit.)

Introducing the problem: what went wrong (so you can avoid it in the future)

With the above out of the way, now we can look at the real problem. Let's start with this (edited slightly to use the updated commit and branch names):

I merged branch-T1 into branch-F (M1), then branch-F into branch-T1 (M2).

I assume here that merging fc2 (as the then-tip of branch-F) and o (as the then-tip of branch-T1) went well, and Git was able to make M1 on its own. As we saw earlier, merging is really based not on branches but on commits. It's the creation of a new commit that adjust the branch names. So this created M1, so that branch-F pointed to M1. M1 itself pointed back to the existing tip of branch-T1—a commit I've now marked o—as its second parent, with fc2 as its first parent. Git figures out the correct contents for this commit by git diff-ing the contents of T0, the merge base, against o and against fc2:

T0-------------o   <-- branch-T1
 \
 F0--fc1---fc2   <--- branch-F (HEAD)

With all going well, Git now makes M1 on its own:

T0-------------o   <-- branch-T1
 \              \
 F0--fc1---fc2---M1   <--- branch-F (HEAD)

Now you git checkout branch-T1 and git merge --no-ff branch-F (without --no-ff Git will just do a fast-forward, which is not what is in the picture), so Git finds the merge base of o and M1, which is o itself. This merge is easy: the difference from o to o is nothing, and nothing plus the difference from o to M1 equals the contents of M1. So M2, as a snapshot, is exactly the same as M1, and Git easily creates it:

T0-------------o----M2   <-- branch-T1 (HEAD)
 \              \  /
 F0--fc1---fc2---M1   <--- branch-F

So far, so good, but now things start to go really wrong:

There was one file in the T1 branch that was having merge conflicts with S ... Given the issues I've had in the past with merge conflict resolutions not behaving how I expect, I thought I'd try something new: merging just the conflicting file from S into T1, solving the merge conflict there, removing all of the other files from the merge, and then allowing continuous integration to merge everything up to S.

So, what you did at this point is:

git checkout branch-T1
git merge branch-S

which stopped with a merge conflict. The graph at this point is the same as the one above, but with some more context:

S0--sc1---sc2---sc3-----sc4   <-- branch-S
  \
   T0-------------o----M2   <-- branch-T1 (HEAD)
    \              \  /
    F0--fc1---fc2---M1   <-- branch-F

The merge operation finds the merge base (S0), diffs that against the two tip commits (M2 and sc4), combines the resulting changes, and applies them to the contents of S0. The one conflicted file is now in the index as the three input copies, and in the work-tree as Git's effort at merging, but with conflict markers. Meanwhile all the unconflicted files are in the index, ready to be frozen.

Alas, you now remove some files (git rm) during the conflicted merge. This removes the files from the index and work-tree both. The resulting commit, M3, will say that the correct way to combine commits M2 and sc4 based on merge-base S0 is to remove those files. (This of course was the mistake.)

This auto-merged to S (M4).

Here, I assume this means that the system, using whatever pre-programmed rule it has, did the equivalent of:

git checkout branch-S
git merge --no-ff branch-T1

which found the merge base of commits sc4 (tip of branch-S) and M3, which is M3, the same way that the merge base of o and M1 was M1 earlier. So the new commit, M4, matches M3 in terms of content, at which point we have:

S0--sc1---sc2---sc3-----sc4----M4   <-- branch-S
  \                        \  /
   T0-------------o----M2---M3   <-- branch-T1
    \              \  /
    F0--fc1---fc2---M1   <-- branch-F

I noticed immediately that excluding those ~200 files looked to have wiped the changes out entirely, which equated to about a month's worth of work across 2 teams. I (incorrectly) decided the best course of action was to act swiftly and revert the merge commits M4 and M3 before my mistake got into anyone else's local repos. I first reverted M4 (R1) and once that was committed I reverted M3 (R2).

Actually, this was a fine thing to do! It gets the right content, which is pretty useful when you do it immediately. Using git checkout branch-s && git revert -m 1 branch-S (or git revert -m 1 <hash-of-M4>) to create R1 from M4 basically undoes the merge in terms of content, so that:

git diff <hash-of-sc4> <hash-of-R1>

should produce nothing at all. Likewise, using git checkout branch-T1 && git revert -m 1 branch-T1 (or the same with the hash) to create R2 from M3 undoes that merge in terms of content: comparing M2 and R2, you should see identical content.

Undoing a merge undoes the contents, but not the history

The problem now is that Git believes that all the changes in your feature branch are correctly incorporated. Any git checkout branch-T1 or git checkout branch-S followed by git merge <any commit within branch-F> will look at the graph, following the backwards-pointing links from commit to commit, and see that this commit within branch-F—such as fc2 or M1—is already merged.

The trick to getting them in is to make a new commit that does the same thing that the commit-sequence from F0 through M1 does, that's not already merged. The easiest—though ugliest—way to do that is to use git merge --squash. The harder, and perhaps better, way to do that is to use git rebase --force-rebase to make a new feature branch. (Note: this option has three spellings and the easiest one to type is -f, but the one in Linus Torvalds' description is --no-ff. I think the most memorable is the --force-rebase version, but I would actually use -f myself.)

Let's take a fast look at both, and then consider which to use and why. In either case, once you are done, you'll have to merge the new commit(s) correctly this time, without removing files; but now that you know what git merge is really doing, it should be a lot easier to do.

We start by creating a new branch name. We can re-use branch-F, but I think it is clearer if we don't. If we want to use git merge --squash, we create this new branch name pointing to commit T0 (ignoring the fact that there are commits after T0—remember, any branch name can point to any commit):

T0   <-- revised-F (HEAD)
 \
  F0--fc1--fc2--M1   <-- branch-F

If we want to use git rebase -f, we create this new name pointing to commit fc2:

T0-----....
 \
  F0--fc1--fc2--M1   <-- branch-F, revised-F (HEAD)

We do this with:

git checkout -b revised-F <hash of T0>   # for merge --squash method

or:

git checkout -b revised-f branch-F^1      # for rebase -f method

depending on which method we want to use. (The ^1 or ~1 suffix—you can use either one—excludes M1 itself, stepping back one first-parent step to fc2. The idea here is to exclude commit o and any other commits reachable from o. There need to be no other merges into branch-F along that bottom row of commits, here.)

Now, if we want to use a "squash merge" (which uses Git's merge machinery without making a merge commit), we run:

git merge --squash branch-F

This uses our current commit, plus the tip of branch-F (commit M1), as the left and right sides of the merge, finding their common commit as the merge base. The common commit is of course just F0, so the merge result is the snapshot in M1. However, the new commit made has only one parent: it is not a merge commit at all, and it looks like this:

   fc1--fc2--M1   <-- branch-F
  /
F0-------------F3   <-- revised-F (HEAD)

The snapshot in F3 matches that in M1, but the commit itself is all new. It gets a new commit message (which you may edit) and its effect, when Git looks at F3 as a commit, is to make the same set of changes made from F0 to M1.

If we choose the rebase method, we now run:

git rebase -f <hash-of-T0>

(You could use instead the hash of o, which is branch-F^2, i.e., the second parent of M1. In this case you can start with revised-F pointing to M1 itself. That's probably what I would do, to avoid having to cut and paste a lot of hash IDs with potential typos, but it's not obvious how this works unless you've done a lot of graph manipulation exercises.)

That is, we want to copy commits F0 through fc2 inclusive to new commits, with new hash IDs. That's what this git rebase will do (see other StackOverflow answers and/or Linus' description above): we get:

  F0'-fc1'-fc2'   <-- revised-F (HEAD)
 /
T0-----....
 \
  F0--fc1--fc2--M1   <-- branch-F

Now that we have revised-F pointing to either a single commit (F3) or a chain of commits (the chain ending at fc2', the copy of fc2), we can git checkout some other branch and git merge revised-F.

Based on comments, here are two paths for doing the re-merge

I assume at this point that you have a squash-merge result (a single-parent commit that's not a merge, but does contain the desired snapshot, which I'm calling F3 here). We need to revise the re-drawn graph a bit too, based on comments that indicate there were more merges into branch-F:

S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5   <-- branch-S
  \                        \  /         /
   T0-----o-------o----M2---M3--------R2   <---- branch-T1
    \      \       \  /
    F0--fc1-o-fc2---M1   <--------------- branch-F

Now we'll add the revised-F branch, which should have a single commit that is a descendant of either F0 or T0. It's not crucial which one. Since I used F0 earlier, let's go with that here:

S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5   <-- branch-S
  \                        \  /         /
   T0-----o-------o----M2---M3--------R2   <---- branch-T1
    \      \       \  /
    F0--fc1-o-fc2---M1   <--------------- branch-F
      \
       ---------------------------------F3   <-- revised-F

The contents of commit F3 match those of M1 (so git diff branch-F revised-F says nothing), but the parent of F3 here is F0. (Note: there are shortcut ways to create F3 using git commit-tree, but as long as it already exists and matches M1 content-wise, we can just use it.)

If we now do:

git checkout branch-T1
git merge revised-F

Git will find the merge base between commit R2 (tip of branch-T1) and F3 (tip of revised-F). If we follow all the backwards (leftwards) links from R2, we can get to T0 via M3 then M2 then some number of os and finally T0, or we can get to F0 via M3 then M2 then M1 then fc2 on back to F0. Meanwhile we can get from F3 straight to F0, in just one hop, so the merge base is probably F0.

(To confirm this, use git merge-base:

git merge-base --all branch-T1 revised-F

This will print one or more hash IDs, one for each merge base. Ideally there's just the one merge base, which is commit F0.)

Git will now run the two git diffs, to compare the contents of F0 to F3—i.e., everything we did to accomplish the feature—and to compare the contents of F0 to those of R2, at the tip of branch-T1. We'll get conflicts where both diffs change the same lines of the same files. Elsewhere, Git will take the contents of F0, apply the combined changes, and leave the result ready to be committed (in the index).

Resolving these conflicts and committing will give you a new commit that results in:

S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5   <-- branch-S
  \                        \  /         /
   T0-----o-------o----M2---M3--------R2-----M6   <---- branch-T1
    \      \       \  /                     /
    F0--fc1-o-fc2---M1   <-- branch-F      /
      \                                   /
       ---------------------------------F3   <-- revised-F

Now M6 is, perhaps, merge-able to branch-S.


Alternatively, we can merge directly to branch-S. It's less obvious which commit is the merge base, but it is probably F0 again. Here is the same drawing again:

S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5   <-- branch-S
  \                        \  /         /
   T0-----o-------o----M2---M3--------R2   <---- branch-T1
    \      \       \  /
    F0--fc1-o-fc2---M1   <--------------- branch-F
      \
       ---------------------------------F3   <-- revised-F

Starting from commit sc5, we work backwards to M5 to R2, and we're now in the same situation we were before. So we can git checkout branch-S and do the same merge, resolve similar conflicts—this time we're comparing F0 to sc5 rather than to R2, so the conflicts might be slightly different—and eventually commit:

S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5----M6   <-- branch-S
  \                        \  /         /           /
   T0-----o-------o----M2---M3--------R2   <------ / -- branch-T1
    \      \       \  /                           /
    F0--fc1-o-fc2---M1   <-- branch-F            /
      \                                         /
       ---------------------------------------F3   <-- revised-F

To verify that F0 is the merge base, use git merge-base as before:

git merge-base --all branch-S revised-F

and to see what you'd have to merge, run two git diffs from the merge base to the two tips.

(Which merge to do is up to you.)

like image 68
torek Avatar answered Mar 16 '23 12:03

torek