My structure looks like this->
master
develop
project
<sprint_number>
<task_number>
I work on the task_number branch. Then I merge task with the sprint branch. Then I merge sprint with the project branch. In this way, all of the commits on project are sprints, and all of the commits on sprint are tasks. After merging into project branch, I submit a merge request and a code review is performed before merging into develop.
Should I do a rebase all the way down the chain? For example:
git checkout develop
git rebase master
git checkout project
git rebase develop
git checkout <sprint_number>
git rebase project
git checkout <task_number>
git rebase <sprint_number>
Git branch names don't actually nest in any sense: they're just pointers to specific commits.
What we need to do here, as usual, is draw some commit Directed Acyclic Graph (DAG) fragments and consider cases where rebasing makes sense. So we start with your example:
master develop project <sprint_number> <task_number>
and add some nodes (and give them single-uppercase-letters instead of their "true name" hashes like a1cf93a...
since those are too big and unwieldy):
A <- B <- C <-- master
\
D <- E <-- develop
\
F <- G <-- project
\
H <-- <sprint_number>
\
I <-- <task_number>
(the backslashes here should be up-and-left arrows but those are too hard to draw in plain text).
That is, in this case we have (at least) three commits on master
(there may be any number of commits before commit A
that we simply did not draw in). The tip of master
is commit C
, which points back to commit B
, which points back to A
.
We have two commits on develop
that are not also on master
: commit E
is the tip of develop
and E
points back to D
, while D
points back to B
. Commit B
, along with all of its ancestors (A
and anything earlier), is on both master
and develop
.
Meanwhile commit G
is the tip of project
; G
points back to F
which points back to E
, and so on. This means commits A
and B
are, in fact, on all three branches. But wait, there's more! H
is the tip of <sprint_number>
and H
points back to G
, and so on; and I
is the tip of <task_number>
and I
points back to H
.
In the end, this means that commits A
and B
are on (at least) five branches (the five shown here), and D
and `E are on at least four branches, and so on.
In git, rebasing actually means copying commits to new, slightly different/modified commits. (This may not be the right approach. We'll get to that later, though, because it won't make sense until you know more.)
The tip of master
is now commit C
rather than commit B
. Presumably, earlier, the tip of master was B
, and that was when we made commit D
(and maybe E
as well). But now you're considering rebasing develop
onto the new tip of master
.
To achieve this you must copy commits D
and E
to new, different commits. We'll call these copies D'
and E'
. Even if nothing else changes—and it's likely that something else does change, specifically whatever is different between B
and C
will go into the new D'
—the copy D'
of original commit D
has to point to commit C
rather than to commit B
.
Drawing just this copy phase (leaving out everything hung off the original E
) we get:
A - B - C <-- master
\ \
\ D' - E' <-- develop (after rebase)
\
D - E [abandoned]
(I've simplified the left pointing arrows this time too, now that we know that commits point leftward.) But while the original D
and E
are no longer pointed-to by branch name develop
, they're still reachable once we fill in the rest of the drawing:
A - B - C <-- master
\ \
\ D' - E' <-- develop (after rebase)
\
D-E
\
F-G <-- project
\
H <-- <sprint_number>
\
I <-- <task_number>
What's particularly significant at this point is that original commits D
and E
are *no longer on develop
.
Ignoring --fork-point
(which can be a solution here), the git rebase
command really takes three arguments, one of which is normally just taken from HEAD
:
HEAD
);The latter two are usually combined into one <upstream>
argument. Meanwhile you first do a git checkout
of the branch to rebase, to set the first argument. For instance, if we were to decide to rebase develop
onto master
:
git checkout develop
git rebase master
Here the tip-most commit to copy is the HEAD
commit as usual, which because of the git checkout
is the tip-most commit of develop
, and the starting place at which the new copies will be grown is the tip of master
. Git starts by considering coping every commit that is on develop
(which would be A
, B
, D
, and E
), but it's told here to avoid copying every commit that is on master
, which means A
, B
, and C
.
(Wait, what? We're not supposed to copy C
? But we weren't going to copy C
in the first place! Well, no problem then, we just won't copy it!) That's how we can combine the two things into one <upstream>
argument. We want to add the new copies after C
, and at the same time, avoid copying C
and everything in the path leading back from C
.
So if we choose to go ahead and do this git rebase
, we'll copy D
and E
to D'
and E'
and end up with the new graph fragment we drew.
That's great for develop
, but what happens now if we do:
git checkout project
git rebase develop
This time, we'll ask git to copy everything reachable from the tip of project
—these are G
, F
, E
, D
, B
, and A
(and maybe something more)—to the tip of the already-rebased develop
, i.e., commit E'
.
This is a problem. It may be a self-solving one, if we're lucky, because rebase will detect some cases of copied commits and avoid re-copying them. That is, when git goes to copy D
to a(nother) new copy D''
, it may detect that D
is already present in E'
. If it does detect this it will just skip the copy. The same happens when it goes to copy E
to E''
: it may detect that this is not needed, and skip the copy.
On the other hand, git's detector may be fooled, and it might copy D
and/or E
. We definitely don't want that, so it's best to avoid asking git to copy them at all.
There are a number of ways to ask, including an interactive rebase (where we get to edit the pick
instructions, so we can delete the two pick
lines for commits D
and E
), or being more clever with arguments to git rebase
:
git checkout project
git rebase --onto develop 'project@{1}'
This second command uses the reflog history to tell git that the commits to copy are those that are on project
(the current branch) that are not contained within the previous tip of project
. That is, 'project@{1}'
resolves to the commit ID of original (un-copied) commit E
. This will therefore copy just commits F
and G
, to F'
and G'
.
(Incidentally, if you draw your DAGs on a whiteboard with colored markers, you can use colors to represent the original commits and their copies. I find this easier to read than all the D'
and D''
notation. I just can't draw it on StackOverflow.)
We can repeat this process with the sprint and task, using the reflog to identify commits to leave out.
Since git 1.9, git rebase
now has --fork-point
, which essentially automates what we're doing here with the reflogs. (There was a bug fix in git 2.1 for git rebase --fork-point
failing to discover commits that don't need to be copied, so it would be wise to limit using this option to 2.1-or-later.) That could therefore be a way to do this.
Finally, before returning to the question of whether this is a good idea at all, I'll make one more note. Instead of rebasing develop
on master
, and project
on develop
, and so on, suppose we started by rebasing the task. This would tell git to copy commit D
to D'
, E
to E'
, F
to F'
, and so on all the way down to copying I
to I'
. The task branch would then point to new commit I'
, whose history chain reaches back to C
. Now all we need to do here is re-point the sprint branch, the project
branch, and the develop
branch at the copied commits, by finding the right copy. The updated develop
should point to E'
; the updated project
should point to G'
; and the updated sprint branch should point to H'
.
If there are additional sprint and/or task branches, they probably need to have some commit(s) copied that would not be copied by the above, though, so this trick has to be used carefully. As always, it will help to draw the DAG first.
If you have a branch structure this complex, rebasing may be the wrong approach. Even if not, it may still be the wrong way to do this.
Remember that, as we just saw, rebasing involves copying commits, and then moving branch labels to point to the new copies, instead of the originals. When you do this with a repository that only you use, it's usually not too terribly confusing, because you move all your branch labels and you are now done: you either have the old, pre-copy state, or the new, post-copy state, and you can ignore all the intermediate (mid-rebase) state except for the brief period of doing all these rebases.
If someone else is sharing this repository, though, consider what you will do to them. Before you did all this massive rebasing, they had what they thought were the right develop
, project
, sprint, and task branch pointers. They were using the original (not yet copied) commits and making their own new commits that depend on those original commits.
Now you come along and tell them: "Oh, hey, forget all those old commits! Use these brand-new shiny ones instead!" Now they have to go find everything they did that depended on the old commits, and update all of those to depend instead on the new ones.
In other words, they must deal with an "upstream rebase"—or in fact, from numerous upstream rebases. It's generally not a lot of fun (though the same --fork-point
code that makes it possible for you to automate this, also makes it possible for them to automate their recovery from the upstream rebases).
There is a time limit on --fork-point
, because it uses reflog entries, and reflog entries expire. If you have not reconfigured things, git defaults to expiring the critical reflog entries after 30 days, so if you do this, everyone else has about a month to recover from it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With