Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GIT: How do I rebase nested branches?

Tags:

git

merge

My structure looks like this->

master
  develop 
    project
      <sprint_number>
        <task_number>

I work on the task_number branch. Then I merge task with the sprint branch. Then I merge sprint with the project branch. In this way, all of the commits on project are sprints, and all of the commits on sprint are tasks. After merging into project branch, I submit a merge request and a code review is performed before merging into develop.

Should I do a rebase all the way down the chain? For example:

git checkout develop
git rebase master
git checkout project
git rebase develop
git checkout <sprint_number>
git rebase project
git checkout <task_number>
git rebase <sprint_number>
like image 885
Musical Shore Avatar asked Dec 25 '22 07:12

Musical Shore


1 Answers

Git branch names don't actually nest in any sense: they're just pointers to specific commits.

First, draw (part of) the commit DAG

What we need to do here, as usual, is draw some commit Directed Acyclic Graph (DAG) fragments and consider cases where rebasing makes sense. So we start with your example:

master
  develop 
    project
      <sprint_number>
        <task_number>

and add some nodes (and give them single-uppercase-letters instead of their "true name" hashes like a1cf93a... since those are too big and unwieldy):

A <- B <- C                <-- master
      \
       D <- E              <-- develop
             \
              F <- G       <-- project
                    \
                     H     <-- <sprint_number>
                      \
                       I   <-- <task_number>

(the backslashes here should be up-and-left arrows but those are too hard to draw in plain text).

That is, in this case we have (at least) three commits on master (there may be any number of commits before commit A that we simply did not draw in). The tip of master is commit C, which points back to commit B, which points back to A.

We have two commits on develop that are not also on master: commit E is the tip of develop and E points back to D, while D points back to B. Commit B, along with all of its ancestors (A and anything earlier), is on both master and develop.

Meanwhile commit G is the tip of project; G points back to F which points back to E, and so on. This means commits A and B are, in fact, on all three branches. But wait, there's more! H is the tip of <sprint_number> and H points back to G, and so on; and I is the tip of <task_number> and I points back to H.

In the end, this means that commits A and B are on (at least) five branches (the five shown here), and D and `E are on at least four branches, and so on.

Decide if rebasing is needed and allowed

In git, rebasing actually means copying commits to new, slightly different/modified commits. (This may not be the right approach. We'll get to that later, though, because it won't make sense until you know more.)

The tip of master is now commit C rather than commit B. Presumably, earlier, the tip of master was B, and that was when we made commit D (and maybe E as well). But now you're considering rebasing develop onto the new tip of master.

To achieve this you must copy commits D and E to new, different commits. We'll call these copies D' and E'. Even if nothing else changes—and it's likely that something else does change, specifically whatever is different between B and C will go into the new D'—the copy D' of original commit D has to point to commit C rather than to commit B.

Drawing just this copy phase (leaving out everything hung off the original E) we get:

A - B - C             <-- master
     \    \
      \     D' - E'   <-- develop (after rebase)
       \
        D - E         [abandoned]

(I've simplified the left pointing arrows this time too, now that we know that commits point leftward.) But while the original D and E are no longer pointed-to by branch name develop, they're still reachable once we fill in the rest of the drawing:

A - B - C             <-- master
     \    \
      \     D' - E'   <-- develop (after rebase)
       \
        D-E
           \
            F-G       <-- project
               \
                H     <-- <sprint_number>
                 \
                  I   <-- <task_number>

What's particularly significant at this point is that original commits D and E are *no longer on develop.

How rebase works

Ignoring --fork-point (which can be a solution here), the git rebase command really takes three arguments, one of which is normally just taken from HEAD:

  • the tip-most commit to copy (this is normally just "your current branch", i.e., HEAD);
  • a specifier that limits which commits to copy, i.e., specifies—but indirectly—commits not to copy; and
  • the identity of the commit to which the first copied commit will be added.

The latter two are usually combined into one <upstream> argument. Meanwhile you first do a git checkout of the branch to rebase, to set the first argument. For instance, if we were to decide to rebase develop onto master:

git checkout develop
git rebase master

Here the tip-most commit to copy is the HEAD commit as usual, which because of the git checkout is the tip-most commit of develop, and the starting place at which the new copies will be grown is the tip of master. Git starts by considering coping every commit that is on develop (which would be A, B, D, and E), but it's told here to avoid copying every commit that is on master, which means A, B, and C.

(Wait, what? We're not supposed to copy C? But we weren't going to copy C in the first place! Well, no problem then, we just won't copy it!) That's how we can combine the two things into one <upstream> argument. We want to add the new copies after C, and at the same time, avoid copying C and everything in the path leading back from C.

So if we choose to go ahead and do this git rebase, we'll copy D and E to D' and E' and end up with the new graph fragment we drew.

That's great for develop, but what happens now if we do:

git checkout project
git rebase develop

This time, we'll ask git to copy everything reachable from the tip of project—these are G, F, E, D, B, and A (and maybe something more)—to the tip of the already-rebased develop, i.e., commit E'.

This is a problem. It may be a self-solving one, if we're lucky, because rebase will detect some cases of copied commits and avoid re-copying them. That is, when git goes to copy D to a(nother) new copy D'', it may detect that D is already present in E'. If it does detect this it will just skip the copy. The same happens when it goes to copy E to E'': it may detect that this is not needed, and skip the copy.

On the other hand, git's detector may be fooled, and it might copy D and/or E. We definitely don't want that, so it's best to avoid asking git to copy them at all.

There are a number of ways to ask, including an interactive rebase (where we get to edit the pick instructions, so we can delete the two pick lines for commits D and E), or being more clever with arguments to git rebase:

git checkout project
git rebase --onto develop 'project@{1}'

This second command uses the reflog history to tell git that the commits to copy are those that are on project (the current branch) that are not contained within the previous tip of project. That is, 'project@{1}' resolves to the commit ID of original (un-copied) commit E. This will therefore copy just commits F and G, to F' and G'.

(Incidentally, if you draw your DAGs on a whiteboard with colored markers, you can use colors to represent the original commits and their copies. I find this easier to read than all the D' and D'' notation. I just can't draw it on StackOverflow.)

We can repeat this process with the sprint and task, using the reflog to identify commits to leave out.

Since git 1.9, git rebase now has --fork-point, which essentially automates what we're doing here with the reflogs. (There was a bug fix in git 2.1 for git rebase --fork-point failing to discover commits that don't need to be copied, so it would be wise to limit using this option to 2.1-or-later.) That could therefore be a way to do this.

Finally, before returning to the question of whether this is a good idea at all, I'll make one more note. Instead of rebasing develop on master, and project on develop, and so on, suppose we started by rebasing the task. This would tell git to copy commit D to D', E to E', F to F', and so on all the way down to copying I to I'. The task branch would then point to new commit I', whose history chain reaches back to C. Now all we need to do here is re-point the sprint branch, the project branch, and the develop branch at the copied commits, by finding the right copy. The updated develop should point to E'; the updated project should point to G'; and the updated sprint branch should point to H'.

If there are additional sprint and/or task branches, they probably need to have some commit(s) copied that would not be copied by the above, though, so this trick has to be used carefully. As always, it will help to draw the DAG first.

Is rebasing right?

If you have a branch structure this complex, rebasing may be the wrong approach. Even if not, it may still be the wrong way to do this.

Remember that, as we just saw, rebasing involves copying commits, and then moving branch labels to point to the new copies, instead of the originals. When you do this with a repository that only you use, it's usually not too terribly confusing, because you move all your branch labels and you are now done: you either have the old, pre-copy state, or the new, post-copy state, and you can ignore all the intermediate (mid-rebase) state except for the brief period of doing all these rebases.

If someone else is sharing this repository, though, consider what you will do to them. Before you did all this massive rebasing, they had what they thought were the right develop, project, sprint, and task branch pointers. They were using the original (not yet copied) commits and making their own new commits that depend on those original commits.

Now you come along and tell them: "Oh, hey, forget all those old commits! Use these brand-new shiny ones instead!" Now they have to go find everything they did that depended on the old commits, and update all of those to depend instead on the new ones.

In other words, they must deal with an "upstream rebase"—or in fact, from numerous upstream rebases. It's generally not a lot of fun (though the same --fork-point code that makes it possible for you to automate this, also makes it possible for them to automate their recovery from the upstream rebases).

There is a time limit on --fork-point, because it uses reflog entries, and reflog entries expire. If you have not reconfigured things, git defaults to expiring the critical reflog entries after 30 days, so if you do this, everyone else has about a month to recover from it.

like image 150
torek Avatar answered Jan 13 '23 13:01

torek