Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What algorithm is used during "git rebase"?

Tags:

git

I can't find an explanation of how git does a rebase internally.

The most frequent answer is about applying patches on a base commit but I don't see how it can be true, because there's no way to correctly apply a patch to a (possibly) completely changed file. Three-way merge is mentioned but without saying precisely what is merged.

like image 663
listerreg Avatar asked Sep 01 '16 19:09

listerreg


1 Answers

If you dig through the Git code, you will find that there are in fact multiple different internal (or "back end") algorithms used for rebase. This has evolved over time: Git 2.26 (March 2020) switched the default from an internal program called git-rebase--am to git-rebase--interactive, and Git 2.12 (Feb 2017) switched the interactive variant from using a shell script to using what Git calls its sequencer. So there's no single right answer that applies to every invocation of git rebase or in every version of Git. Note that even in very old versions of Git, git rebase -i uses the git-rebase--interactive back end.

In any case, think of this as having each commit copied as if by git cherry-pick, or as if by git format-patch and git am. There are some minor differences for the older git am-based algorithm—for instance, it doesn't handle file renames at all; that's the main reason it was retired—and in the case of modern Git we mostly use git cherry-pick anyway, but it may be easier to think about this as "getting and applying a patch".

The more interesting thing, then, is really how git am and/or git cherry-pick do their jobs. The long answer is long and boring and probably best shortened to "go look at the source code", but the short answer is: git am tries to apply the change as a patch first, and only if that fails, falls back on a full three-way merge. See the -3 flag in git am for a brief description as well. Meanwhile git cherry-pick just does a straight three-way merge.

The merge base used here is often not all that helpful. Consider this initial DAGlet and a git rebase that intends to copy A to A' and B to B', appending these atop commit D:

       A--B        <-- branch
      /
...--*------C--D   <-- origin/branch

The first cherry pick op is picking A, so it diffs commit A against commit * (the merge base of the two branches). With the git am method, Git tries to apply this as a patch atop D. If that fails, or if you're using interactive rebase resulting in git cherry-pick, the commit (if cherry-pick) or each failed file (if git am-ing)—get run through the three-way merge process.

That's actually pretty reasonable: we're really trying to replay *-to-A after dealing with *-to-D. The final result is A':

       A--B        <-- branch
      /
...--*------C--D   <-- origin/branch
                \
                 A'    [detached HEAD]

But now we copy B, which for git am means producing a patch going from A to B. For those parts of the patch that apply, we just apply them. For files where there is a conflict, this time we diff files from A-vs-B and A-vs-A', doing low-level file merges for each failed-to-patch file using these diffs. A is kind of a weird merge base here, but it's clearly better than nothing, and it usually works perfectly.

For actual git cherry-pick rebases, Git uses the entire commit A as the merge base and does a normal three-way merge across the entire tree. Each file's merge base is the version of the file from commit A.

(If we had more commits past B, this would continue for the rest of them.)

It's easy to see that this must be the case for the non-interactive:

git format-patch -k --stdout --full-index --cherry-pick --right-only \
    --src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
    "$revisions" ${restrict_revision+^$restrict_revision} \
    >"$GIT_DIR/rebased-patches"
...
git am $git_am_opt --rebasing --resolvemsg="$resolvemsg" \
    ${gpg_sign_opt:+"$gpg_sign_opt"} <"$GIT_DIR/rebased-patches"

code, since git am gets only the Index: lines to use to construct base files. It's trickier to locate in the git rebase code, as the critical bits are buried deep in sequencer.c. In older Git, you could just look at the shell script and see how it ran git cherry-pick.

like image 113
torek Avatar answered Oct 03 '22 02:10

torek