I have a branch I've been working on personally over several computers for the past few months. The result is a long history chain that I want to clean up before I merge it onto the master branch. Ultimately the goal is to get rid of all those wip commits that I frequently make when working on server code.
Here is a screenshot of the gitk history visualization:
http://imgur.com/a/I9feO
Way at the bottom of this is the point where I branched off of master. Master has changed a bit since I started this branch, but the changes have been disjoint, so the merge should be a piece of cake. My usually workflow is to rebase onto master and then squash the wip commits.
I tried to execute a simple
git rebase -i master
and I edited the commits to sqush.
It seemed to start off well, but then it failed and wanted me to address a conflict. However, it seemed like there was no good way to address it by looking at the diffs. Each piece was using variables that were undefined in the scope, so I wasn't sure how to resolve them.
I also attempted using git rebase -i -s recursive -X theirs master
, which didn't result in a conflict, but it changed the state of HEAD from the revised branch (I want to edit history in such a way that the end result in HEAD does not change).
I believe these conflicts are arising from the parts of the chain where you can see a diamond pattern. (eg. between reworeked classifiers... and Merge branch iccv).
To phrase my question better let A
="Merge branch iccv", and B
="reworked classifiers" refer to the example in the image. And the commits in between will be X
and Y
.
...
|
|
A
/ \
| X
Y |
\ /
B
|
|
...
I want to rewrite history so the state of A
is exactly as it is, and effectively destroy intermediate representations X
and Y
, so the resulting history looks like this
...
|
|
A
|
|
B
|
|
...
Is there a way to squash the resolved state of A
, X
and Y
into a single commit in the middle of a history chain like this?
If A
and B
are the SHAIDs of the commits is there a simple command I can run (or perhaps a script) that achieves the result I want?
If A
was the HEAD I believe I could do
git reset B
git commit -am "recreating the A state"
to create a new head, but how can I do this if A
is in the middle of a history chain like this. I want to maintain this history of all the nodes that come after it.
First make the current working tree clean and then run these commands:
#initial state
git branch backup thesis4
git checkout -b tmp thesis4
git reset A --hard
git reset B --soft
git commit
git cherry-pick A..thesis4
git checkout thesis4
git reset tmp --hard
git branch -D tmp
S
is the squash of X,Y,A
. M'
is equivalent to M
and N'
to N
. In case you want to restore the initial state, run
git checkout thesis4
git reset backup --hard
This can be done, but it's anywhere from a bit of a pain, to a lot of pain, with the usual mechanisms.
The fundamental problem is that you must copy commits to new (slightly different) commits whenever you want to change things. The reason is that no commit can ever change.1 The reason is that the hash ID of a commit is the commit, in a very real sense: Git's hash IDs are how Git finds the underlying object. Change any bit within the object and it gets a new, different hash ID.2 Hence, when you want to go from:
X
/ \
...--B A--C--D--E <-- branch
\ /
Y
to something that looks like:
...--B--A--C--D--E <-- branch
the thing after B
cannot be A
, it has to be a different commit that just smells like A
. We can call this commit A'
to tell them apart:
...--B--A'-...
But if we copy A
to a new, fresher-smelling (but same tree) A'
that no longer has the intermediate stuff in its history—that is, A'
connects directly to B
—then we must also copy the first commit after A'
. Once we do that, we must copy the commit after that one, and so on. The result is:
...--B--A'-C'-D'-E' <-- branch
1Psychologists like to say that change is hard, but for Git, it's literally impossible! :-)
2Hash collisions are technically possible, but if they occur, they mean that your repository stops adding new things. That is, if you managed to come up with a new commit that was like the old one, but had your desired change, and had the same hash ID, Git would forbid you from adding it!
git rebase -i
Note: Use this method if possible; it's much easier to understand and to get right.
The standard command that copies commits like this is git rebase
. However, rebase deals very poorly with merge commits like A
. In fact, it normally throws them out entirely, favoring instead linearizing everything:
...--B--X--Y'-C'-D'-E' <-- branch
for instance.
Now, if merge commit A
went well, i.e., nothing in X
depends on Y
or vice versa, a simple git rebase -i <hash-of-B>
may suffice. You can change all but the first one of the pick
s for commits X
and Y
—which may actually be many commits—to squash
and everything all just goes well and you are done: Git drops X
and Y'
entirely in favor of a single combined XY'
commit that has the same tree your merge commit A
had. The result is:
...--B--XY'-C'-D'-E' <-- branch
and if we call XY'
A'
, and then drop all the tick marks by forgetting their original hash IDs, we get just what you wanted.
git replace
If the merge was difficult, though, what you want is to preserve the tree from the merge, while dropping all the X
and Y
commits. Here git replace
is the (or a) right solution. Git's replace is somewhat complicated, but you can instruct Git to make a new commit A'
that is "like A
but has B
as its single parent hash ID". Git will now have this commit graph structure:
X
/ \
...--B A--C--D--E <-- branch
|\ /
| Y
\
A' <-- refs/replace/<complicated-thing>
This special refs/replace
name tells Git that, when it is doing things like git log
and other commands that use commit IDs, Git should turn its metaphorical eyes away from commit A
and look instead at commit A'
. Since A'
is otherwise a copy of A
, git checkout <hash of A>
makes Git look at A'
and check out the same tree; and git log
shows the same log message when it looks aside at A'
instead of A
.
Note that both A
and A'
exist in the repository at this point. They are side-by-side, as it were, with Git just showing you A'
instead of A
unless you use the special --no-replace-objects
flag. Once Git has shown you (and used) A'
instead of A
, it follows the backwards link from A'
to B
, skipping right over all of X
and Y
.
X
and Y
entirelyOnce you are happy with the replacement, you may want to make it permanent. You can do this with git filter-branch
, which simply copies commits. It copies starting from some start point and moving forward in history, in the reverse of Git's normal backwards "start at today and work backwards in history" manner.
When filter-branch is making its copies—and its list of what to copy—it normally does this same eye-averting thing that the rest of Git does. So if we have the history shown above, and we tell filter-branch to end on branch
and start just after commit B
, it will gather the existing commit list as:
E, D, C, A'
and then reverse the order. (In fact, we could stop at A'
if we like, as we'll see.)
Next, filter-branch will copy A'
to a new commit. This new commit will have B
as its parent, the same log message as A'
, the same tree, the same author and date-stamps and so on—in short, it will literally be identical to A'
. So it will get the same hash ID as A'
, and actually be commit A'
.
Next, filter-branch
will copy C
to a new commit. This new commit will have A'
as its parent, the same log message as C
, and the same tree and so on. This is slightly different from the original C
, whose parent is A
, not A'
. So this new commit gets a different hash ID: it becomes commit C'
.
Next, filter-branch
will copy D
. This will become D'
, in the same way C
's copy was C'
.
Finally, filter-branch
will copy E
to E'
and make branch
point to E'
, giving us this:
X
/ \
...--B A--C--D--E <-- refs/original/refs/heads/branch
|\ /
| Y
\
A' <-- refs/replace/<complicated-thing>
\
C'-D'-E' <-- branch
We can now delete the refs/replace/
name and the backup copy of refs/heads/branch
that filter-branch made to save the original E
. When we do that, the names get out of the way, and we can re-draw our graph:
...--B--A'-C'-D'-E' <-- branch
which is just what we wanted (and got) from using git rebase -i
, but without having to do the merge all over again.
To tell git filter-branch
where to stop, use ^<hash-id>
or ^<name>
. Otherwise git filter-branch
won't stop listing commits to copy until it runs out of commits: it will follow commit B
to its parent, and to that parent's parent, and so on all the way back through history. The copies of these commits will be bit-for-bit identical to the originals, which means they will actually be the originals, same hash ID and all; but they will take a long time to make.
Since we can stop at <hash-id-of-B>
or even <hash-id-of-A'>
, we can use ^refs/replace/<hash>
to identify commit A
. Or we can just use ^<hash-id>
, which is probably actually easier.
Furthermore, we can write either ^<hash> branch
or <hash>..branch
. Both mean the same thing (see the gitrevisions documentation for details). So:
git filter-branch -- <hash>..branchname
suffices to do the filtering to cement the replacement into place.
If all went well, delete the refs/original/
reference as shown near the end of the git filter-branch
documentation, and delete the replacement reference as well, and you are done.
As an alternative to git replace
, you can also use git cherry-pick
to copy commits. See ElpieKay's answer for details. This is fundamentally the same idea as before, but uses the "copy commits" tool instead of the "rebase to copy commits and then hide the originals away" tool. It has one tricky step, using git reset --soft
to get the index set up to match commit A
to make commit A'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With