Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does git-rebase recognize "aliased" commits?

I'm trying to better understand the magic behind git-rebase. I was very pleasantly surprised today by the following behavior, which I didn't expect.

TLDR: I rebased a shared branch, causing all commit sha1s to change. Despite this, a derived branch was able to accurately identify that its original commits were "aliased" into new commits with different sha1s. The rebase didn't create any mess at all.

Details

Take a master branch: M1

Branch it off into branch-X, with some additional commits added: M1-A1-B1-C1. Note down the git-log output.

Branch off branch-X into branch-Y, with one additional commit added: M1-A1-B1-C1-D1. Note down the git-log output.

Add a new commit to the tip of the master branch: M1-M2

Rebase branch-X onto the updated master: M1-M2-A2-B2-C2. Note that A2-B2-C2, all have the same message, contents and author-date as A1-B1-C1. However, they have completely different sha1 values, as well as commit dates. According to this writeup, the reason the SHA1 is different is because the commit's parent has changed.

Rebase branch-Y onto the updated branch-X. Result: M1-M2-A2-B2-C2-D2.

Notably only the D1 commit is applied (and becomes D2). The A1-B1-C1 commits in branch-Y are completely ignored by git-rebase. You can see this in the output logs.

This is wonderful, but how does git-rebase know to ignore A1-B1-C1? How does git-rebase know that A2-B2-C2 are the same as A1-B1-C1, and hence, can be safely ignored? I had always assumed that git keeps track of commits using the sha1 identifier, but despite the above commits having different sha1s, git still somehow knows that they are linked together. How does it do that? Given the above behavior, when is it truly dangerous to rebase a shared branch?

like image 699
RvPr Avatar asked Aug 23 '17 19:08

RvPr


People also ask

Does rebase ignore merge commits?

By default, a rebase will simply drop merge commits from the todo list, and put the rebased commits into a single, linear branch.

How does git rebase work?

What is git rebase? From a content perspective, rebasing is changing the base of your branch from one commit to another making it appear as if you'd created your branch from a different commit. Internally, Git accomplishes this by creating new commits and applying them to the specified base.

Does rebase preserving merge commits?

To be more explicit about the main differences between normal and merge-preserving rebase: Merge-preserving rebase is willing to replay (some) merge commits, whereas normal rebase completely ignores merge commits.

Does rebase change commit ID?

Note that the commits modified with a rebase command have a different ID than either of the original commits. Commits marked with pick will have a new ID if the previous commits have been rewritten.


1 Answers

Internally, git rebase lists commits that should be rebased, and then computes a patch-id for these commits. Unlike the commit id, it only hashes the content of the patch, not the content of the tree and commit objects. So, A1 and A2, while having different identifiers, have the same patch-id. Then, git rebase skips patches whose patch-id is already present.

For more information, search patch-id here: https://git-scm.com/book/en/v2/Git-Branching-Rebasing


Relevant section from above (diagrams missing):

If someone on your team force pushes changes that overwrite work that you’ve based work on, your challenge is to figure out what is yours and what they’ve rewritten.

It turns out that in addition to the commit SHA-1 checksum, Git also calculates a checksum that is based just on the patch introduced with the commit. This is called a “patch-id”.

If you pull down work that was rewritten and rebase it on top of the new commits from your partner, Git can often successfully figure out what is uniquely yours and apply them back on top of the new branch.

For instance, in the previous scenario, if instead of doing a merge when we’re at Someone pushes rebased commits, abandoning commits you’ve based your work on we run git rebase teamone/master, Git will:

  • Determine what work is unique to our branch (C2, C3, C4, C6, C7)
  • Determine which are not merge commits (C2, C3, C4)
  • Determine which have not been rewritten into the target branch (just C2 and C3, since C4 is the same patch as C4')
  • Apply those commits to the top of teamone/master

This only works if C4 and C4' that your partner made are almost exactly the same patch. Otherwise the rebase won’t be able to tell that it’s a duplicate and will add another C4-like patch (which will probably fail to apply cleanly, since the changes would already be at least somewhat there).

like image 52
Matthieu Moy Avatar answered Sep 19 '22 13:09

Matthieu Moy