Say that I have the following scenario:
A -- B -- C -- D -- H master
\ /
E -- F -- G topicA
\
I -- J -- K topicB
topicA
was merged into master
using the --squash switch, which means that master
doesn't know the history of topicA
.
If I now merge master
into topicB
and then do a diff master...topicB
the diff is messed up and contains a lot of changes or undoings which shouldn't be there.
In that case, I usually merge master
into topicA
and then topicA
into topicB
before doing what was said in the previous paragraph. However, sometimes that it's not possible (e.g. the branch was deleted) and I end with a lot of conflicts.
How should I proceed in this case? Do I have any misconception?
Is rebase --onto master topicA topicB
the right solution?
As DavidN said in a comment, your plan looks sound enough.
(The rest of this is far too long and rambly; it was written between / during other tasks.)
Since H
was created by git merge --squash
, the drawing is wrong. It should read:
A -- B -- C -- D -- H master
\
E -- F -- G topicA
\
I -- J -- K topicB
The key difference is that commit H
is not related to the E--F--G
sequence, at least not in any Git-detected sense. Commit H
's content is affected by whatever happened in the E--F--G
sequence (and, of course, by whatever happened in the C--D
sequence) but as far as Git knows now, someone came along and wrote H
without even looking at E--F--G
.
I'm going to have a fairly big digression here now.
If I now merge
master
intotopicB
OK, let's draw that as a commit graph, to make sure that this means what you intend. I will use my usual form (slightly more compact, with arrows from branch-names scooted over to the right a bit):
A--B--C---D----H <-- master (still points to H)
\ \
E--F--G \ <-- topicA (still points to G)
\ \
I--J--K--L <-- topicB (points to new L)
Note that I drew a real merge, not a fake, not-a-merge-at-all "squash merge". This really does matter, as we will see.
When Git goes to make this new commit L
, it has to merge commits H
and K
. To do that, it has to find their merge base (in some cases there can be several merge bases, but here there's just one).
The merge base(s) of any two commits is / are the Lowest Common Ancestors: that is, the commits closest to the two starting commits (H
and K
) that are reachable from both of those starting commits.
Let's start with H
and K
themselves first. H
is reachable from H
(of course) but not from K
. K
is reachable from K
(of course) and but not from H
. Now we can check D
vs H
and K
: D
is reachable from H
but not from K
. Now we can check J
, but it's not reachable from H
. Now we consider C
and I
, and F
, and E
, but it's not until we get all the way back to commit B
that we find a commit reachable from both H
and K
. Commit A
would also work, but it's further away from both H
and K
, so commit B
is the merge base.
The merge then starts with two diffs:
git diff B H
and
git diff B K
The first diff shows what we changed going from B
to H
. Of course, H
has what we changed in C
and D
, plus whatever we changed in E
, F
, and G
. The second diff shows what we changed going from B
to K
. Of course, K
has what we changed in E
, plus whatever we changed in I--J--K
.
This has whatever we changed in E
twice, but Git usually—not always, but usually—does a good job of noticing that and picking up the change only once. So commit L
probably has everything from every previous commit, done just once.
and then do a
diff master...topicB
Note that this is using the three-dot ...
syntax, not the two-dot ..
syntax. I'm not sure what you intend here, but the three-dot syntax essentially means "find the (or a) merge base". So let's go through this exercise again: master
still points to commit H
and topicB
now points to the new merge commit L
, and we find the merge base of H
and L
, now that we have a real merge (none of this stupid "squash merge" fake merge stuff for us, no way!).
So let's start with H
and L
themselves first. L
is reachable from L
(of course) but not from H
. H
is reachable from H
(of course) and also from L
. This means the merge base of H
and L
is H
: the merge base of master
and topicB
is master.
...
diff master...topicB
Since master
is on the left of the triple-dot, it's replaced with the merge base, which is commit H
. The right side of the triple-dot is resolved to its commit, which is commit L
. The diff then shows you whatever is different between H
and L
.
In this case, the effect is the same as for git diff master..topicB
, which means the same thing as git diff master topicB
: compare commits H
and L
, in that order.
That should be a pretty sensible diff, in spite of the horrible fake squash-merge we did initially to make H
. The real merge sort of repaired this, at least for H
vs L
.
Let's draw this thing yet again but this time using the fake not-a-merge git merge --squash
technique. The contents of our new commit L
will be the same as if we had done a real merge, but the graph will be different:
A--B--C---D----H <-- master (still points to H)
\
E--F--G <-- topicA (still points to G)
\
I--J--K--L <-- topicB (points to new L)
Now we go back to:
diff master...topicB
Once again, we need to find the merge base between H
and L
, but now L
does not point back to both K
and H
, but only to K
. Neither H
nor L
is the merge base. Neither D
nor J
work either: we can't walk backwards to D
from L
, and we can't walk backwards to J
from H
. In fact, the merge base commit is again commit B
, so this means the same thing as:
git diff B L
and this diff will be quite different.
I do not know what you were expecting from your diff, so I cannot address this part:
the diff is messed up and contains a lot of changes or undoings which shouldn't be there.
Now let's return to the question:
In that case, I usually merge
master
intotopicA
and thentopicA
intotopicB
before doing what was said in the previous paragraph. However, sometimes that it's not possible (e.g. the branch was deleted) and I end with a lot of conflicts.
Note that deleting a branch name has no immediate effect upon its commits. What it does do is to stop protecting those commits. That is, because each branch name makes commits reachable, those commits are safe from the Grim Collector ...er... Grim Reaper Garbage Collector. We did that reachability thing several times to find merge bases; Git does it even more often, though, to find commits to keep and commits to discard, during GC; commits to transfer, during push
and fetch
; and so on. If the commits are protected by some other means—by reachability through a real merge, or by reachability from another branch or tag name, or whatever—they stick around. If you can find them by hash ID, you can bring them back.
More importantly for your rebase
case, if you can find them by git log
, you can cut them off. We'll see this in a moment.
Because squash "merges" are not actually merges at all, they won't protect the other chain of commit, and—this is usually the key to future the merge conflicts—they do not provide future merges with updated merge bases. This means those future merges must examine huge diffs, instead of small diffs, and then Git's automated "redundant change" detection fails.
What this means in practice depends on how you use these squash not-a-merge "merges". When you use them to take a line of development and reduce it to a single commit, it's probably a good idea to stop using that other line of development entirely. You can save it (using a branch or tag name, or even some other reference outside the branch and tag name spaces so that you don't normally see it, that keeps the commit chain from being GC-ed) or just let it get reaped, but either way you probably should not continue working on it, and that includes any other branches you have that fork off from some commit(s) on it.
git rebase
Is
rebase --onto master topicA topicB
the right solution?
Using git rebase
, you can copy these other chains—your topicB
, in this case—to new chains and then point the label (topicB
) to the tip of the copied chain. The commits you want to copy are those that were not squashed: here, that's the I--J--K
chain. Using topicA
as the <upstream>
argument to git rebase
will select the right set of commits. Note that topicA
reaches commits G
, F
, E
, B
, and A
, while topicB
reaches K
, J
, I
, F
, E
, and so on; so using topicA
as <upstream>
chops off everything from F
on back, but then requires the explicit --onto
that you provided.
If the label topicA
were deleted, you could still do this rebase, it just gets trickier. What you would need to do is to specify either of commits G
or F
by their hash IDs, so as to chop off commits F
and earlier. The hash ID of G
is anywhere from hard-to-find (GC has not deleted it but it is unreachable from any live reference) to non-existent (GC has deleted it). The ID for F
, however, is right there in the topicB
chain: K
's parent is J
, J
's parent is I
, and I
's parent is F
. The problem is that there is no easy way to determine that commit F
was in the set of commits that were in the chain that the earlier git merge --squash
handled.
(This is related to, but not quite the same thing as, the earlier remark I bolded.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With