Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git squash without hundreds of merge conflicts?

Tags:

git

I've been working on a feature branch for several months now. In the middle we merged master into the branch several times. Now, after the branch is done, we realized quite alot of the commits from the feature branch had messy commit messages, and it would be good to squash several of these together.

However, how can i do this without landing into rebase hell? I've tried rebasing but due to the fact master has over a hundred new commits, and so does our feature branch, and the fact master has merged into our branch at several touch points, I'm running into hundreds of merge conflicts while rebasing. Is there an easier way? I just want to squash some of the commit messages together.

Thanks.

like image 353
thatDubstepSound Avatar asked Feb 22 '17 17:02

thatDubstepSound


People also ask

How do I squash multiple commits without using git merge?

You can do this fairly easily without git rebase or git merge --squash . In this example, we'll squash the last 3 commits. Both of those methods squash the last three commits into a single new commit in the same way. The soft reset just re-points HEAD to the last commit that you do not want to squash.

Is it better to squash and merge?

You should consider using squash if your team prefers a linear project history. This means that the history held by your main branch should not contain merges. A squash merge makes it possible to keep changes condensed to a single commit, supporting this strategy nicely.


2 Answers

Use git merge --squash with a temporary branch

I'm just plagiarizing Lars Kellogg-Stedman's beautiful blog post on the subject.

I think it is much simpler and more succinct than the accepted answer.

Lars gives several options but the first one is to merge commit using a temporary branch as follows:

Slightly modifying what Lars says:

  1. Check out a new branch based on master (or the appropriate base branch if your feature branch isn’t based on master):
git checkout -b work master

(This creates a new branch called work and makes that your current branch.)

  1. Bring in the changes from your messy pull request using git merge --squash:
git merge --squash my_feature

This brings in all the changes from your my_feature branch and stages them, but does not create any commits.

  1. Commit the changes with an appropriate commit message:
git commit -m "my squash commit message"

At this point, you work branch should be identical to the original my_feature branch (running git diff master should not show any changes), but it will have only a single commit after master.

  1. Return to your feature branch and reset it to the squashed version:
git checkout my_feature
git reset --hard work
  1. Update your pull request:
git push -f
  1. Optionally clean up your work branch:
git branch -D work
like image 135
Zephaniah Grunschlag Avatar answered Sep 22 '22 21:09

Zephaniah Grunschlag


TL;DR summary: git rebase -i <commit>

Essentially, you need to find the point at which the branch you want to work on "forks off", and rebase using that as the <upstream> argument to git rebase.

Explanation

The easiest way to selectively squash various commits together is to use git rebase -i (and then interactively editing the rebase instructions). I assume this is what you are doing.

The problem, of course—the one you are hitting—is that this is also git rebase. :-) If you run it naively, it not only gives you an interactive rebase that lets you fuss with the instructions and hence re-build your commit chain, it also tries to change the base of your commit chain.

Let's take a quick look at what git rebase does (interactive or not) and what I am calling the "commit chain" here.

Drawing the graph

As usual with most things Git, it's a good idea to sit down (or stand at a whiteboard, or something like that) and draw at least some part of the commit graph. The commit graph is formed (and hence drawn) in a backwards fashion. Starting from the most recent commit on some branch, we (or Git) must read each commit, and draw it as a node (with or without a name/ID attached), with an outgoing arrow from each commit to its parent commit(s).

Most commits have just one parent, and as we just noted, the most recent commit is the one whose ID is stored in current branch name. Since everything in Git goes backwards, this ends up looking like:

... <- o <- o <- o    <-- branch

We say that the name, branch, points to the final commit on the branch. (That final commit has a special name too: it's the tip commit.)

While all the internal arrows here are backwards, we mostly don't need to care about that, so for non-whiteboard drawings I just leave them out:

...--o--o--o   <-- branch

Of course, to "rebase" this branch, it has to branch off something. That means we have a more main-line branch as well:

...--o--*--o--...--o     <-- mainline
         \
          A--B--...--Z   <-- branch

That one commit that I marked with * is the point at which the branches diverge. Commit * is on both branches, as are all earlier (leftward) commits on the main-line. Commits to the right of * are only on mainline (top row commits) or only on branch (bottom row). I've given the bottom row commits one-letter names instead of the big ugly hash IDs that Git actually uses, because the big ugly hashes are not human-friendly. Of course, that only lets me write 26 commits, not hundreds, here, but that should be fine for an example.

What rebase does

Now, what git rebase does is copy commits. It has to: for technical reasons, it is literally impossible to change anything about any existing commit. When you run git rebase you are doing it with the purpose of changing something. Usually, what you want to do is to change where the commits go in terms of the graph (which also changes which source tree they are based off of). Instead of the graph above, you "change"—i.e., copy to new commits—the old commits, with a result that looks like this:

...--o--*--o--...--o                <-- mainline
         \          \
          \          A'-B'-...--Z'  <-- branch (copies)
           \
            A--B--...--Z            [originals, now abandoned]

The new copies are "just as good" as the originals, or rather, even better—new and improved—because they go at a different point, and start from a different source base. They are re-based.

What gets copied, and where?

You can sometimes just run git rebase, although sometimes you must run git rebase mainline for this. But what, exactly, does the mainline argument do? (For that matter, how is it that sometimes you can leave it out?)

That argument—mainline—actually does two things. One is quite obvious: it's where the copies go. The name mainline points to the tip commit on the mainline branch. We want the copies to go after that, so we say "rebase onto mainline".

The less-obvious thing mainline does is tell rebase what not to copy.

Remember, just a moment ago, we noted that commit * is on both branches (as are all its earlier commits). Those are the commits we don't want to copy. We want rebase to start with the first commit after *, i.e., commit A,1 and then go on to the tip of branch, i.e., commit Z.

You can, if you need to—sometimes you do—split this apart, telling git rebase where to add the copies with --onto, which leaves the remaining argument available for "what not to copy". In our particular case, we don't need to, though.


1Note that there is another commit right after *, on mainline. It's better, perhaps, to say "the first commit after * in the direction of branch. The precise definition Git uses, though, is that provided by the gitrevisions two-dot range syntax.


What we want to do is "copy in place"

In our case, for this particular problem, we'd like to do the git rebase -i pick/squash/edit thing, without actually moving the copies to a new base. That is, instead of:

...--o--*--o--...--o                <-- mainline
         \          \
          \          A'-B'-...--Z'  <-- branch (copies)
           \
            A--B--...--Z            [originals, now abandoned]

we want something more like:

...--o--*--o--...--o              <-- mainline
        |\
        | AB--CDE--F'--...--XYZ   <-- branch (copies, with squashing)
        \
         A--B--...--Z             [originals, now abandoned]

The way to do that is to name, not mainline, but commit * itself:

git rebase -i <hash-ID-of-commit-*>

This means we have to find the hash ID of commit *.

One way to do that is to draw the graph (and this is a good exercise, but it gets tedious). Another is to have Git draw the graph for you, using git log --graph (you may want to use DOG: --decorate --oneline --graph, which presents the output in a compact and friendly fashion, tail-wagging optional). In general, though, you can find the ID using git merge-base:

git merge-base mainline branch

This prints the ID of the first (i.e., most recent / most-tip-wards) commit that is on both branches. (The fact that this is called the merge base is also a clue: it's a key to how git merge works. But that's for another answer entirely.)

Hence, you want to do:

git merge-base your-branch the-other-branch

and then copy-and-paste the ID after a git rebase -i command, so that the rebase limits the copies at, and goes right after, the merge base. Since that's the same base as the original commits that you are copying, they won't have merge conflicts (unless you re-organize some of the commits to go in a different order, anyway).

Last minute notes and caveats

One thing to note about git rebase is that it does not copy merge commits. It can't, in the general case, so mostly it does not even try. It does have a --preserve-merges flag, but this actually re-performs the merges. But by default, it just drops them entirely. This means that if you have a graph like:

...--o--*--o--o              <-- mainline
         \
          \      C--D
           \    /    \
            A--B      G--H   <-- branch
                \    /
                 E--F

and you ask Git to rebase branch onto mainline, the new copies are A-B-C-D-E-F-H (omitting G entirely) or A-B-E-F-C-D-H (still omitting G entirely, but doing the bottom row first instead of the top row first). This is somewhat likely to produce merge conflicts when doing the second row, whichever gets done second, and or when adding H, especially if there were merge conflicts you had to resolve when producing G in the first place.

If you run git rebase with no extra argument—not saying mainline or giving a commit ID—the command looks at the current branch's upstream setting (as set by git branch --set-upstream-to). If there is no setting at all, git rebase requires the extra argument. If there is an upstream, it's usually a remote-tracking branch like origin/branch, and rebase pretends you typed that in—except that, in Git 2.0, using the implicit upstream turns on the fork point machinery (which is much too complex to go into here).

like image 38
torek Avatar answered Sep 23 '22 21:09

torek