Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate commits after rebase have been merged into the develop branch

I recently pulled down a remote branch that another developer was working on, let's call it feature. I then made the mistake of doing a rebase onto develop - the main working branch - which I've now come to know is something that you shouldn't do. The feature branch has since been merged into develop.

The problem I have now is that develop has a strange Git history. All of the commits from feature seem to be duplicated, they appear twice. However, they have different commit IDs.

My history looks a bit like this now (IDs are for demo purposes):

0007 commit from feature #3  <--- these commits are duplicated
0006 commit from feature #2
0005 commit from feature #1
0004 different commit from another branch #2
0004 different commit from another branch #1
0002 commit from feature #3
0002 commit from feature #2
0001 commit from feature #1

I've made a silly mistake! Is there anything I can do about this? The history looks ugly, but all the correct code seems to be there. Can I remove the duplicated commits? Or is there any other way to clean up the history?

Please write your answer for a less experienced Git user.

like image 874
shrewdbeans Avatar asked Nov 11 '16 15:11

shrewdbeans


2 Answers

What happened

"Copy commits" is just what git rebase does. It copies some commits, then shuffles the branch pointers around so as to "forget" or "abandon" the original commits. (But see below.)

Here is an illustration of how git rebase does this copying. The single letters represent commits, and the names on the right are branch names, which actually point only to one commit, namely the "tip of the branch". Each commit points back to its parent commit, i.e., the A--B connector lines are really meant to be left-pointing arrows (and ones that go diagonally also still point left, to earlier commits, with later commits being towards the right):

     C--D   <-- branch1
    /
A--B
    \
     E      <-- branch2

This is the "before" picture, where you have only "original" commits. You now decide to git checkout branch1 and git rebase branch2 so that C and D come after E. But Git can't actually change the original C--D at all, so instead it copies them to new copies, C' and D', with the new ones being slightly different: they come after E (and also use whatever code changes you did in E):

     C--D      [abandoned]
    /
A--B
    \
     E         <-- branch2
      \
       C'-D'   <-- branch1

Completely forgetting the original C--D would be OK here, but what if you decide this was a bad idea after all? A rebase keeps the original value of the branch in your "reflogs" to remember it. It also uses the special name ORIG_HEAD. This is much easier to use, but there's only one ORIG_HEAD, while there are a potentially-infinite number of reflog entries. Reflog entries are kept for at least 30 days by default, giving you time to change your mind. Look back up at the second graph and imagine that ORIG_HEAD is added.

Now, the problem you have hit occurs because it's not just branch names that remember previous commits. Each commit also remembers its own previous commits, via those connecting, left-pointing arrows. So let's see what happens if there was either another name, or some other (merge) commit, remembering C and D. For instance, what if we have this much more complicated starting graph:

    .-----F    <-- branch3
   /     /
  /  C--D      <-- branch1
 /  /
A--B
    \
     E         <-- branch2

If we now "rebase" branch1, we get this:

    .-----F    <-- branch3
   /     /
  /  C--D      [ORIG_HEAD and reflog]
 /  /
A--B
    \
     E         <-- branch2
      \
       C'-D'   <-- branch1

Commit F is a merge commit: it points back to both commit A and commit D. So it retains the original D, which retains the original C, giving us kind of a mess.

F could be a plain ordinary commit, pointing back only to D, and we would see the same problem. Plain ordinary commits are much easier to copy, though, so if F were not a merge—if our F pointed back only to D and not to A—we could carefully rebase branch3 as well, copying F to F', where F' comes after our new D'. It's possible to re-do the merge too, but that's a bit trickier (not that copying F correctly is all that easy either way—it's easy to "get lost" and copy C--D yet again by mistake).

When this happens

You will encounter this problem whenever you copy commits that you or someone else made, and both you and the "someone else" (perhaps the "other you") are also still using the originals. This happened with our commit F, for instance: we were still using the original C--D chain. We can fix this by making a new F' and using that, as long as we're the only one using branch3. But if branch3 is published, or for that matter if we've published branch1, so that someone else might have them as origin/branch1 or origin/branch3, we have lost control over the original copies of C--D.

Hence the standard advice is to rebase only private (unpublished) commits, since you know who is using them—it's just you of course—and you can check with yourself and make sure you're not using them, or that it's OK to copy them because you also plan to copy or otherwise re-do commits like F.

If you have done the rebase—made the copies—and published them (pushed them to origin), you're kind of stuck. You can "undo" your rebase anyway, and beg everyone else who shares the use of origin to make sure they don't use your C'-D' type copies for anything because you're putting the originals back.

(For more-advanced groups of users, you can even all agree that certain branches get rebased regularly, and you and they must all recognize when this happens, and all of you will then take care to switch to the new commit copies. However, this is probably not what you want to do right now!)

Undoing it

So, if you (a) can and (b) want to "undo" your rebase, now the reflog, or the saved ORIG_HEAD, really come in handy. Let's take the second example again and look at what we have after we forgot that branch3 still remembers the original C-D commits:

    .-----F    <-- branch3
   /     /
  /  C--D      [ORIG_HEAD and reflog]
 /  /
A--B
    \
     E         <-- branch2
      \
       C'-D'   <-- branch1

Now, imagine we erase the name branch1 from the bottom row and write in a new <-- branch1 pointing to commit D:

    .-----F    <-- branch3
   /     /
  /  C--D      <-- branch1
 /  /
A--B
    \
     E         <-- branch2
      \
       C'-D'   [abandoned]

Now that we've abandoned C'-D', just stop looking at it. Compare this graph to the original graph, and voila! That's what you want!

The command that "moves" a branch label in arbitrary fashion like this is git reset (it moves the current branch, so you have to be on branch1). Look up the raw commit hash for D in the reflog, or check that ORIG_HEAD is correct, or use the reflog spelling to identify commit D. (For newbies, I find cut-and-paste of the raw hash is the way to go.) For instance, try:

$ git log --graph --decorate --oneline ORIG_HEAD

to see if ORIG_HEAD gets you the right hash. If not, try git reflog branch1 (looking at the specific reflog for branch1 here) to find hashes, then use:

$ git log --graph --decorate --oneline branch1@{1}

(or cut and paste the raw hash instead of using branch1@{1}). Once you've found the desired "original" commit, you can then:

$ git status     # to make sure you're on the right branch
                 # and that everything is clean, because
                 # "git reset --hard" wipes out in-progress work!
$ git reset --hard ORIG_HEAD

(or put in branch1@{1}, or the raw hash ID, in place of ORIG_HEAD as usual).1 That moves the current branch (which we just checked) so that it points to the given commit (branch1@{1}, from the reflog, or ORIG_HEAD or a raw hash ID), to get us that final graph drawing back. The --hard sets both our index/staging-area, and our work-tree, to match the new commit to which we've just re-pointed our branch.


1The general idea here, which recurs all the time in Git, is that we must name some specific commit, from which Git finds the rest of the commits if necessary. Any name works: a branch name, a name like HEAD, a reflog name like master@{1}, or a raw commit hash ID. Git does not really care how you tell it "look at this here commit"; ultimately, Git resolves that name to one those big ugly SHA-1 hash IDs, and uses that.

like image 136
torek Avatar answered Sep 28 '22 00:09

torek


Use the git reflog to revert your changes.

Read all about it in here (how to recover previous head/ how to undo changes):

What to do?

Type git reflog and find out the "last good" sha-1 which you want to go back to.
run

git reset <SHA-1> --hard

And you are back in the previous commit before you did your mistake.

like image 35
CodeWizard Avatar answered Sep 28 '22 00:09

CodeWizard