Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git workflow and rebase vs merge questions

I've been using Git now for a couple of months on a project with one other developer. I have several years of experience with SVN, so I guess I bring a lot of baggage to the relationship.

I have heard that Git is excellent for branching and merging, and so far, I just don't see it. Sure, branching is dead simple, but when I try to merge, everything goes all to hell. Now, I'm used to that from SVN, but it seems to me that I just traded one sub-par versioning system for another.

My partner tells me that my problems stem from my desire to merge willy-nilly, and that I should be using rebase instead of merge in many situations. For example, here's the workflow that he's laid down:

clone the remote repository git checkout -b my_new_feature ..work and commit some stuff git rebase master ..work and commit some stuff git rebase master ..finish the feature git checkout master git merge my_new_feature 

Essentially, create a feature branch, ALWAYS rebase from master to the branch, and merge from the branch back to master. Important to note is that the branch always stays local.

Here is the workflow that I started with

clone remote repository create my_new_feature branch on remote repository git checkout -b --track my_new_feature origin/my_new_feature ..work, commit, push to origin/my_new_feature git merge master (to get some changes that my partner added) ..work, commit, push to origin/my_new_feature git merge master ..finish my_new_feature, push to origin/my_new_feature git checkout master git merge my_new_feature delete remote branch delete local branch 

There are two essential differences (I think): I use merge always instead of rebasing, and I push my feature branch (and my feature branch commits) to the remote repository.

My reasoning for the remote branch is that I want my work backed up as I'm working. Our repository is automatically backed up and can be restored if something goes wrong. My laptop is not, or not as thoroughly. Therefore, I hate to have code on my laptop that's not mirrored somewhere else.

My reasoning for the merge instead of rebase is that merge seems to be standard and rebase seems to be an advanced feature. My gut feeling is that what I'm trying to do is not an advanced setup, so rebase should be unnecessary. I've even perused the new Pragmatic Programming book on Git, and they cover merge extensively and barely mention rebase.

Anyway, I was following my workflow on a recent branch, and when I tried to merge it back to master, it all went to hell. There were tons of conflicts with things that should have not mattered. The conflicts just made no sense to me. It took me a day to sort everything out, and eventually culminated in a forced push to the remote master, since my local master has all conflicts resolved, but the remote one still wasn't happy.

What is the "correct" workflow for something like this? Git is supposed to make branching and merging super-easy, and I'm just not seeing it.

Update 2011-04-15

This seems to be a very popular question, so I thought I'd update with my two years experience since I first asked.

It turns out that the original workflow is correct, at least in our case. In other words, this is what we do and it works:

clone the remote repository git checkout -b my_new_feature ..work and commit some stuff git rebase master ..work and commit some stuff git rebase master ..finish the feature, commit git rebase master git checkout master git merge my_new_feature 

In fact, our workflow is a little different, as we tend to do squash merges instead of raw merges. (Note: This is controversial, see below.) This allows us to turn our entire feature branch into a single commit on master. Then we delete our feature branch. This allows us to logically structure our commits on master, even if they're a little messy on our branches. So, this is what we do:

clone the remote repository git checkout -b my_new_feature ..work and commit some stuff git rebase master ..work and commit some stuff git rebase master ..finish the feature, commit git rebase master git checkout master git merge --squash my_new_feature git commit -m "added my_new_feature" git branch -D my_new_feature 

Squash Merge Controversy - As several commenters have pointed out, the squash merge will throw away all history on your feature branch. As the name implies, it squashes all the commits down into a single one. For small features, this makes sense as it condenses it down into a single package. For larger features, it's probably not a great idea, especially if your individual commits are already atomic. It really comes down to personal preference.

Github and Bitbucket (others?) Pull Requests - In case you're wondering how merge/rebase relates to Pull Requests, I recommend following all the above steps up until you're ready to merge back to master. Instead of manually merging with git, you just accept the PR. Note that this will not do a squash merge (at least not by default), but non-squash, non-fast-forward is the accepted merge convention in the Pull Request community (as far as I know). Specifically, it works like this:

clone the remote repository git checkout -b my_new_feature ..work and commit some stuff git rebase master ..work and commit some stuff git rebase master ..finish the feature, commit git rebase master git push # May need to force push ...submit PR, wait for a review, make any changes requested for the PR git rebase master git push # Will probably need to force push (-f), due to previous rebases from master ...accept the PR, most likely also deleting the feature branch in the process git checkout master git branch -d my_new_feature git remote prune origin 

I've come to love Git and never want to go back to SVN. If you're struggling, just stick with it and eventually you'll see the light at the end of the tunnel.

like image 609
Micah Avatar asked Jan 19 '09 15:01

Micah


People also ask

Which is Better Git rebase or git merge?

But, instead of using a merge commit, rebasing re-writes the project history by creating brand new commits for each commit in the original branch. The major benefit of rebasing is that you get a much cleaner project history. First, it eliminates the unnecessary merge commits required by git merge .

When to Use merge and rebase in git?

Reading the official Git manual it states that rebase “reapplies commits on top of another base branch”, whereas merge “joins two or more development histories together”. In other words, the key difference between merge and rebase is that while merge preserves history as it happened, rebase rewrites it.

What is the difference between rebasing and merge in Git?

Git rebase and merge both integrate changes from one branch into another. Where they differ is how it's done. Git rebase moves a feature branch into a master. Git merge adds a new commit, preserving the history.

Why we should not use git rebase?

Rebasing can be dangerous! Rewriting history of shared branches is prone to team work breakage. This can be mitigated by doing the rebase/squash on a copy of the feature branch, but rebase carries the implication that competence and carefulness must be employed.


2 Answers

TL;DR

A git rebase workflow does not protect you from people who are bad at conflict resolution or people who are used to a SVN workflow, like suggested in Avoiding Git Disasters: A Gory Story. It only makes conflict resolution more tedious for them and makes it harder to recover from bad conflict resolution. Instead, use diff3 so that it's not so difficult in the first place.


Rebase workflow is not better for conflict resolution!

I am very pro-rebase for cleaning up history. However if I ever hit a conflict, I immediately abort the rebase and do a merge instead! It really kills me that people are recommending a rebase workflow as a better alternative to a merge workflow for conflict resolution (which is exactly what this question was about).

If it goes "all to hell" during a merge, it will go "all to hell" during a rebase, and potentially a lot more hell too! Here's why:

Reason #1: Resolve conflicts once, instead of once for each commit

When you rebase instead of merge, you will have to perform conflict resolution up to as many times as you have commits to rebase, for the same conflict!

Real scenario

I branch off of master to refactor a complicated method in a branch. My refactoring work is comprised of 15 commits total as I work to refactor it and get code reviews. Part of my refactoring involves fixing the mixed tabs and spaces that were present in master before. This is necessary, but unfortunately it will conflict with any change made afterward to this method in master. Sure enough, while I'm working on this method, someone makes a simple, legitimate change to the same method in the master branch that should be merged in with my changes.

When it's time to merge my branch back with master, I have two options:

git merge: I get a conflict. I see the change they made to master and merge it in with (the final product of) my branch. Done.

git rebase: I get a conflict with my first commit. I resolve the conflict and continue the rebase. I get a conflict with my second commit. I resolve the conflict and continue the rebase. I get a conflict with my third commit. I resolve the conflict and continue the rebase. I get a conflict with my fourth commit. I resolve the conflict and continue the rebase. I get a conflict with my fifth commit. I resolve the conflict and continue the rebase. I get a conflict with my sixth commit. I resolve the conflict and continue the rebase. I get a conflict with my seventh commit. I resolve the conflict and continue the rebase. I get a conflict with my eighth commit. I resolve the conflict and continue the rebase. I get a conflict with my ninth commit. I resolve the conflict and continue the rebase. I get a conflict with my tenth commit. I resolve the conflict and continue the rebase. I get a conflict with my eleventh commit. I resolve the conflict and continue the rebase. I get a conflict with my twelfth commit. I resolve the conflict and continue the rebase. I get a conflict with my thirteenth commit. I resolve the conflict and continue the rebase. I get a conflict with my fourteenth commit. I resolve the conflict and continue the rebase. I get a conflict with my fifteenth commit. I resolve the conflict and continue the rebase.

You have got to be kidding me if this is your preferred workflow. All it takes is a whitespace fix that conflicts with one change made on master, and every commit will conflict and must be resolved. And this is a simple scenario with only a whitespace conflict. Heaven forbid you have a real conflict involving major code changes across files and have to resolve that multiple times.

With all the extra conflict resolution you need to do, it just increases the possibility that you will make a mistake. But mistakes are fine in git since you can undo, right? Except of course...

Reason #2: With rebase, there is no undo!

I think we can all agree that conflict resolution can be difficult, and also that some people are very bad at it. It can be very prone to mistakes, which why it's so great that git makes it easy to undo!

When you merge a branch, git creates a merge commit that can be discarded or amended if the conflict resolution goes poorly. Even if you have already pushed the bad merge commit to the public/authoritative repo, you can use git revert to undo the changes introduced by the merge and redo the merge correctly in a new merge commit.

When you rebase a branch, in the likely event that conflict resolution is done wrong, you're screwed. Every commit now contains the bad merge, and you can't just redo the rebase*. At best, you have to go back and amend each of the affected commits. Not fun.

After a rebase, it's impossible to determine what was originally part of the commits and what was introduced as a result of bad conflict resolution.

*It can be possible to undo a rebase if you can dig the old refs out of git's internal logs, or if you create a third branch that points to the last commit before rebasing.

Take the hell out of conflict resolution: use diff3

Take this conflict for example:

<<<<<<< HEAD TextMessage.send(:include_timestamp => true) ======= EmailMessage.send(:include_timestamp => false) >>>>>>> feature-branch 

Looking at the conflict, it's impossible to tell what each branch changed or what its intent was. This is the biggest reason in my opinion why conflict resolution is confusing and hard.

diff3 to the rescue!

git config --global merge.conflictstyle diff3 

When you use the diff3, each new conflict will have a 3rd section, the merged common ancestor.

<<<<<<< HEAD TextMessage.send(:include_timestamp => true) ||||||| merged common ancestor EmailMessage.send(:include_timestamp => true) ======= EmailMessage.send(:include_timestamp => false) >>>>>>> feature-branch 

First examine the merged common ancestor. Then compare each side to determine each branch's intent. You can see that HEAD changed EmailMessage to TextMessage. Its intent is to change the class used to TextMessage, passing the same parameters. You can also see that feature-branch's intent is to pass false instead of true for the :include_timestamp option. To merge these changes, combine the intent of both:

TextMessage.send(:include_timestamp => false) 

In general:

  1. Compare the common ancestor with each branch, and determine which branch has the simplest change
  2. Apply that simple change to the other branch's version of the code, so that it contains both the simpler and the more complex change
  3. Remove all the sections of conflict code other than the one that you just merged the changes together into

Alternate: Resolve by manually applying the branch's changes

Finally, some conflicts are terrible to understand even with diff3. This happens especially when diff finds lines in common that are not semantically common (eg. both branches happened to have a blank line at the same place!). For example, one branch changes the indentation of the body of a class or reorders similar methods. In these cases, a better resolution strategy can be to examine the change from either side of the merge and manually apply the diff to the other file.

Let's look at how we might resolve a conflict in a scenario where merging origin/feature1 where lib/message.rb conflicts.

  1. Decide whether our currently checked out branch (HEAD, or --ours) or the branch we're merging (origin/feature1, or --theirs) is a simpler change to apply. Using diff with triple dot (git diff a...b) shows the changes that happened on b since its last divergence from a, or in other words, compare the common ancestor of a and b with b.

    git diff HEAD...origin/feature1 -- lib/message.rb # show the change in feature1 git diff origin/feature1...HEAD -- lib/message.rb # show the change in our branch 
  2. Check out the more complicated version of the file. This will remove all conflict markers and use the side you choose.

    git checkout --ours -- lib/message.rb   # if our branch's change is more complicated git checkout --theirs -- lib/message.rb # if origin/feature1's change is more complicated 
  3. With the complicated change checked out, pull up the diff of the simpler change (see step 1). Apply each change from this diff to the conflicting file.

like image 87
Edward Anderson Avatar answered Sep 29 '22 01:09

Edward Anderson


"Conflicts" mean "parallel evolutions of a same content". So if it goes "all to hell" during a merge, it means you have massive evolutions on the same set of files.

The reason why a rebase is then better than a merge is that:

  • you rewrite your local commit history with the one of the master (and then reapply your work, resolving any conflict then)
  • the final merge will certainly be a "fast forward" one, because it will have all the commit history of the master, plus only your changes to reapply.

I confirm that the correct workflow in that case (evolutions on common set of files) is rebase first, then merge.

However, that means that, if you push your local branch (for backup reason), that branch should not be pulled (or at least used) by anyone else (since the commit history will be rewritten by the successive rebase).


On that topic (rebase then merge workflow), barraponto mentions in the comments two interesting posts, both from randyfay.com:

  • A Rebase Workflow for Git: reminds us to fetch first, rebase:

Using this technique, your work always goes on top of the public branch like a patch that is up-to-date with current HEAD.

(a similar technique exists for bazaar)

  • Avoiding Git Disasters: A Gory Story: about the dangers of git push --force (instead of a git pull --rebase for instance)
like image 35
VonC Avatar answered Sep 29 '22 01:09

VonC