I've been working on a feature branch for several months now. In the middle we merged master into the branch several times. Now, after the branch is done, we realized quite alot of the commits from the feature branch had messy commit messages, and it would be good to squash several of these together.
However, how can i do this without landing into rebase hell? I've tried rebasing but due to the fact master has over a hundred new commits, and so does our feature branch, and the fact master has merged into our branch at several touch points, I'm running into hundreds of merge conflicts while rebasing. Is there an easier way? I just want to squash some of the commit messages together.
Thanks.
You can do this fairly easily without git rebase or git merge --squash . In this example, we'll squash the last 3 commits. Both of those methods squash the last three commits into a single new commit in the same way. The soft reset just re-points HEAD to the last commit that you do not want to squash.
You should consider using squash if your team prefers a linear project history. This means that the history held by your main branch should not contain merges. A squash merge makes it possible to keep changes condensed to a single commit, supporting this strategy nicely.
git merge --squash
with a temporary branchI'm just plagiarizing Lars Kellogg-Stedman's beautiful blog post on the subject.
I think it is much simpler and more succinct than the accepted answer.
Lars gives several options but the first one is to merge commit using a temporary branch as follows:
Slightly modifying what Lars says:
git checkout -b work master
(This creates a new branch called work
and makes that your current branch.)
git merge --squash my_feature
This brings in all the changes from your my_feature
branch and stages them, but does not create any commits.
git commit -m "my squash commit message"
At this point, you work
branch should be identical to the original my_feature
branch (running git diff master should not show any changes), but it will have only a single commit after master.
git checkout my_feature
git reset --hard work
git push -f
git branch -D work
git rebase -i <commit>
Essentially, you need to find the point at which the branch you want to work on "forks off", and rebase using that as the <upstream>
argument to git rebase
.
The easiest way to selectively squash various commits together is to use git rebase -i
(and then interactively editing the rebase instructions). I assume this is what you are doing.
The problem, of course—the one you are hitting—is that this is also git rebase. :-) If you run it naively, it not only gives you an interactive rebase that lets you fuss with the instructions and hence re-build your commit chain, it also tries to change the base of your commit chain.
Let's take a quick look at what git rebase
does (interactive or not) and what I am calling the "commit chain" here.
As usual with most things Git, it's a good idea to sit down (or stand at a whiteboard, or something like that) and draw at least some part of the commit graph. The commit graph is formed (and hence drawn) in a backwards fashion. Starting from the most recent commit on some branch, we (or Git) must read each commit, and draw it as a node (with or without a name/ID attached), with an outgoing arrow from each commit to its parent commit(s).
Most commits have just one parent, and as we just noted, the most recent commit is the one whose ID is stored in current branch name. Since everything in Git goes backwards, this ends up looking like:
... <- o <- o <- o <-- branch
We say that the name, branch
, points to the final commit on the branch. (That final commit has a special name too: it's the tip commit.)
While all the internal arrows here are backwards, we mostly don't need to care about that, so for non-whiteboard drawings I just leave them out:
...--o--o--o <-- branch
Of course, to "rebase" this branch, it has to branch off something. That means we have a more main-line branch as well:
...--o--*--o--...--o <-- mainline
\
A--B--...--Z <-- branch
That one commit that I marked with *
is the point at which the branches diverge. Commit *
is on both branches, as are all earlier (leftward) commits on the main-line. Commits to the right of *
are only on mainline
(top row commits) or only on branch
(bottom row). I've given the bottom row commits one-letter names instead of the big ugly hash IDs that Git actually uses, because the big ugly hashes are not human-friendly. Of course, that only lets me write 26 commits, not hundreds, here, but that should be fine for an example.
Now, what git rebase
does is copy commits. It has to: for technical reasons, it is literally impossible to change anything about any existing commit. When you run git rebase
you are doing it with the purpose of changing something. Usually, what you want to do is to change where the commits go in terms of the graph (which also changes which source tree they are based off of). Instead of the graph above, you "change"—i.e., copy to new commits—the old commits, with a result that looks like this:
...--o--*--o--...--o <-- mainline
\ \
\ A'-B'-...--Z' <-- branch (copies)
\
A--B--...--Z [originals, now abandoned]
The new copies are "just as good" as the originals, or rather, even better—new and improved—because they go at a different point, and start from a different source base. They are re-based.
You can sometimes just run git rebase
, although sometimes you must run git rebase mainline
for this. But what, exactly, does the mainline
argument do? (For that matter, how is it that sometimes you can leave it out?)
That argument—mainline
—actually does two things. One is quite obvious: it's where the copies go. The name mainline
points to the tip commit on the mainline
branch. We want the copies to go after that, so we say "rebase onto mainline".
The less-obvious thing mainline
does is tell rebase what not to copy.
Remember, just a moment ago, we noted that commit *
is on both branches (as are all its earlier commits). Those are the commits we don't want to copy. We want rebase to start with the first commit after *
, i.e., commit A
,1 and then go on to the tip of branch
, i.e., commit Z
.
You can, if you need to—sometimes you do—split this apart, telling git rebase
where to add the copies with --onto
, which leaves the remaining argument available for "what not to copy". In our particular case, we don't need to, though.
1Note that there is another commit right after *
, on mainline
. It's better, perhaps, to say "the first commit after *
in the direction of branch
. The precise definition Git uses, though, is that provided by the gitrevisions
two-dot range syntax.
In our case, for this particular problem, we'd like to do the git rebase -i
pick/squash/edit thing, without actually moving the copies to a new base. That is, instead of:
...--o--*--o--...--o <-- mainline
\ \
\ A'-B'-...--Z' <-- branch (copies)
\
A--B--...--Z [originals, now abandoned]
we want something more like:
...--o--*--o--...--o <-- mainline
|\
| AB--CDE--F'--...--XYZ <-- branch (copies, with squashing)
\
A--B--...--Z [originals, now abandoned]
The way to do that is to name, not mainline
, but commit *
itself:
git rebase -i <hash-ID-of-commit-*>
This means we have to find the hash ID of commit *
.
One way to do that is to draw the graph (and this is a good exercise, but it gets tedious). Another is to have Git draw the graph for you, using git log --graph
(you may want to use DOG: --decorate --oneline --graph
, which presents the output in a compact and friendly fashion, tail-wagging optional). In general, though, you can find the ID using git merge-base
:
git merge-base mainline branch
This prints the ID of the first (i.e., most recent / most-tip-wards) commit that is on both branches. (The fact that this is called the merge base is also a clue: it's a key to how git merge
works. But that's for another answer entirely.)
Hence, you want to do:
git merge-base your-branch the-other-branch
and then copy-and-paste the ID after a git rebase -i
command, so that the rebase limits the copies at, and goes right after, the merge base. Since that's the same base as the original commits that you are copying, they won't have merge conflicts (unless you re-organize some of the commits to go in a different order, anyway).
One thing to note about git rebase
is that it does not copy merge commits. It can't, in the general case, so mostly it does not even try. It does have a --preserve-merges
flag, but this actually re-performs the merges. But by default, it just drops them entirely. This means that if you have a graph like:
...--o--*--o--o <-- mainline
\
\ C--D
\ / \
A--B G--H <-- branch
\ /
E--F
and you ask Git to rebase branch
onto mainline
, the new copies are A-B-C-D-E-F-H
(omitting G
entirely) or A-B-E-F-C-D-H
(still omitting G
entirely, but doing the bottom row first instead of the top row first). This is somewhat likely to produce merge conflicts when doing the second row, whichever gets done second, and or when adding H
, especially if there were merge conflicts you had to resolve when producing G
in the first place.
If you run git rebase
with no extra argument—not saying mainline
or giving a commit ID—the command looks at the current branch's upstream setting (as set by git branch --set-upstream-to
). If there is no setting at all, git rebase
requires the extra argument. If there is an upstream, it's usually a remote-tracking branch like origin/branch
, and rebase pretends you typed that in—except that, in Git 2.0, using the implicit upstream turns on the fork point machinery (which is much too complex to go into here).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With