Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git: How to squash all commits between two commits into a single commit

Tags:

git

rebase

I have a branch I've been working on personally over several computers for the past few months. The result is a long history chain that I want to clean up before I merge it onto the master branch. Ultimately the goal is to get rid of all those wip commits that I frequently make when working on server code.

Here is a screenshot of the gitk history visualization:

enter image description herehttp://imgur.com/a/I9feO

Way at the bottom of this is the point where I branched off of master. Master has changed a bit since I started this branch, but the changes have been disjoint, so the merge should be a piece of cake. My usually workflow is to rebase onto master and then squash the wip commits.

I tried to execute a simple

git rebase -i master

and I edited the commits to sqush.

It seemed to start off well, but then it failed and wanted me to address a conflict. However, it seemed like there was no good way to address it by looking at the diffs. Each piece was using variables that were undefined in the scope, so I wasn't sure how to resolve them.

I also attempted using git rebase -i -s recursive -X theirs master, which didn't result in a conflict, but it changed the state of HEAD from the revised branch (I want to edit history in such a way that the end result in HEAD does not change).

I believe these conflicts are arising from the parts of the chain where you can see a diamond pattern. (eg. between reworeked classifiers... and Merge branch iccv).


To phrase my question better let A="Merge branch iccv", and B="reworked classifiers" refer to the example in the image. And the commits in between will be X and Y.

      ...
       |
       |
       A 
     /  \
    |   X
    Y   |
     \ /
      B
      |
      |
     ...

I want to rewrite history so the state of A is exactly as it is, and effectively destroy intermediate representations X and Y, so the resulting history looks like this

      ...
       |
       |
       A 
       |
       |
       B
       |
       | 
      ...

Is there a way to squash the resolved state of A, X and Y into a single commit in the middle of a history chain like this?

If A and B are the SHAIDs of the commits is there a simple command I can run (or perhaps a script) that achieves the result I want?

If A was the HEAD I believe I could do

git reset B
git commit -am "recreating the A state"

to create a new head, but how can I do this if A is in the middle of a history chain like this. I want to maintain this history of all the nodes that come after it.

like image 761
Erotemic Avatar asked May 06 '17 00:05

Erotemic


Video Answer


2 Answers

First make the current working tree clean and then run these commands:

#initial state

enter image description here

git branch backup thesis4
git checkout -b tmp thesis4

enter image description here

git reset A --hard

enter image description here

git reset B --soft

enter image description here

git commit

enter image description here

git cherry-pick A..thesis4

enter image description here

git checkout thesis4

enter image description here

git reset tmp --hard
git branch -D tmp

enter image description here

S is the squash of X,Y,A. M' is equivalent to M and N' to N. In case you want to restore the initial state, run

git checkout thesis4
git reset backup --hard
like image 82
ElpieKay Avatar answered Sep 19 '22 21:09

ElpieKay


This can be done, but it's anywhere from a bit of a pain, to a lot of pain, with the usual mechanisms.

The fundamental problem is that you must copy commits to new (slightly different) commits whenever you want to change things. The reason is that no commit can ever change.1 The reason is that the hash ID of a commit is the commit, in a very real sense: Git's hash IDs are how Git finds the underlying object. Change any bit within the object and it gets a new, different hash ID.2 Hence, when you want to go from:

       X
      / \
...--B   A--C--D--E   <-- branch
      \ /
       Y

to something that looks like:

...--B--A--C--D--E   <-- branch

the thing after B cannot be A, it has to be a different commit that just smells like A. We can call this commit A' to tell them apart:

...--B--A'-...

But if we copy A to a new, fresher-smelling (but same tree) A' that no longer has the intermediate stuff in its history—that is, A' connects directly to B—then we must also copy the first commit after A'. Once we do that, we must copy the commit after that one, and so on. The result is:

...--B--A'-C'-D'-E'  <-- branch

1Psychologists like to say that change is hard, but for Git, it's literally impossible! :-)

2Hash collisions are technically possible, but if they occur, they mean that your repository stops adding new things. That is, if you managed to come up with a new commit that was like the old one, but had your desired change, and had the same hash ID, Git would forbid you from adding it!


Using git rebase -i

Note: Use this method if possible; it's much easier to understand and to get right.

The standard command that copies commits like this is git rebase. However, rebase deals very poorly with merge commits like A. In fact, it normally throws them out entirely, favoring instead linearizing everything:

...--B--X--Y'-C'-D'-E'   <-- branch

for instance.

Now, if merge commit A went well, i.e., nothing in X depends on Y or vice versa, a simple git rebase -i <hash-of-B> may suffice. You can change all but the first one of the picks for commits X and Y—which may actually be many commits—to squash and everything all just goes well and you are done: Git drops X and Y' entirely in favor of a single combined XY' commit that has the same tree your merge commit A had. The result is:

...--B--XY'-C'-D'-E'   <-- branch

and if we call XY' A', and then drop all the tick marks by forgetting their original hash IDs, we get just what you wanted.


Using git replace

If the merge was difficult, though, what you want is to preserve the tree from the merge, while dropping all the X and Y commits. Here git replace is the (or a) right solution. Git's replace is somewhat complicated, but you can instruct Git to make a new commit A' that is "like A but has B as its single parent hash ID". Git will now have this commit graph structure:

       X
      / \
...--B   A--C--D--E   <-- branch
     |\ /
     | Y
     \
      A'  <-- refs/replace/<complicated-thing>

This special refs/replace name tells Git that, when it is doing things like git log and other commands that use commit IDs, Git should turn its metaphorical eyes away from commit A and look instead at commit A'. Since A' is otherwise a copy of A, git checkout <hash of A> makes Git look at A' and check out the same tree; and git log shows the same log message when it looks aside at A' instead of A.

Note that both A and A' exist in the repository at this point. They are side-by-side, as it were, with Git just showing you A' instead of A unless you use the special --no-replace-objects flag. Once Git has shown you (and used) A' instead of A, it follows the backwards link from A' to B, skipping right over all of X and Y.

Making the replacement permanent, shedding X and Y entirely

Once you are happy with the replacement, you may want to make it permanent. You can do this with git filter-branch, which simply copies commits. It copies starting from some start point and moving forward in history, in the reverse of Git's normal backwards "start at today and work backwards in history" manner.

When filter-branch is making its copies—and its list of what to copy—it normally does this same eye-averting thing that the rest of Git does. So if we have the history shown above, and we tell filter-branch to end on branch and start just after commit B, it will gather the existing commit list as:

E, D, C, A'

and then reverse the order. (In fact, we could stop at A' if we like, as we'll see.)

Next, filter-branch will copy A' to a new commit. This new commit will have B as its parent, the same log message as A', the same tree, the same author and date-stamps and so on—in short, it will literally be identical to A'. So it will get the same hash ID as A', and actually be commit A'.

Next, filter-branch will copy C to a new commit. This new commit will have A' as its parent, the same log message as C, and the same tree and so on. This is slightly different from the original C, whose parent is A, not A'. So this new commit gets a different hash ID: it becomes commit C'.

Next, filter-branch will copy D. This will become D', in the same way C's copy was C'.

Finally, filter-branch will copy E to E' and make branch point to E', giving us this:

       X
      / \
...--B   A--C--D--E   <-- refs/original/refs/heads/branch
     |\ /
     | Y
     \
      A'  <-- refs/replace/<complicated-thing>
       \
        C'-D'-E'  <-- branch

We can now delete the refs/replace/ name and the backup copy of refs/heads/branch that filter-branch made to save the original E. When we do that, the names get out of the way, and we can re-draw our graph:

...--B--A'-C'-D'-E'  <-- branch

which is just what we wanted (and got) from using git rebase -i, but without having to do the merge all over again.

The mechanics of filter-branch

To tell git filter-branch where to stop, use ^<hash-id> or ^<name>. Otherwise git filter-branch won't stop listing commits to copy until it runs out of commits: it will follow commit B to its parent, and to that parent's parent, and so on all the way back through history. The copies of these commits will be bit-for-bit identical to the originals, which means they will actually be the originals, same hash ID and all; but they will take a long time to make.

Since we can stop at <hash-id-of-B> or even <hash-id-of-A'>, we can use ^refs/replace/<hash> to identify commit A. Or we can just use ^<hash-id>, which is probably actually easier.

Furthermore, we can write either ^<hash> branch or <hash>..branch. Both mean the same thing (see the gitrevisions documentation for details). So:

git filter-branch -- <hash>..branchname

suffices to do the filtering to cement the replacement into place.

If all went well, delete the refs/original/ reference as shown near the end of the git filter-branch documentation, and delete the replacement reference as well, and you are done.


Using cherry-pick

As an alternative to git replace, you can also use git cherry-pick to copy commits. See ElpieKay's answer for details. This is fundamentally the same idea as before, but uses the "copy commits" tool instead of the "rebase to copy commits and then hide the originals away" tool. It has one tricky step, using git reset --soft to get the index set up to match commit A to make commit A'.

like image 30
torek Avatar answered Sep 18 '22 21:09

torek