Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does git rebase delete a file added in the most recent commit if it was deleted by the rebase branch?

I'm trying to figure out why git rebase causes a newly created file to be deleted if the branch I'm rebasing off of deleted it. For example:

A1 - A2 - A3
 \
  B1

A2 = add a new file test.txt
A3 = delete test.txt
B1 = add the exact same file as A2

If B1 is checked out and I execute git rebase A3, test.txt is still deleted. I'd expect the result to be:

A1 - A2 - A3 - B1

Which would mean that test.txt still exists. Why is test.txt deleted after the rebase?

like image 577
maxbart Avatar asked Dec 10 '22 17:12

maxbart


1 Answers

Wow, this was a tough one! :-)

Using your script, I reproduced the problem. There was something very odd about it all though, so first, I trimmed out the rebase step, leaving this (slightly modified) script:

#!/bin/sh
set -e
if [ -d testing_git ]; then
    echo test dir testing_git already exists - halting
    exit 1
fi

mkdir testing_git
cd testing_git

git init
touch main.txt
git add .
git commit -m "initial commit"

# setup B branch
git checkout -b B
echo hello > test.txt
git add .
git commit -m "added test.txt"

# setup master
git checkout master
echo hello > test.txt
git add .
git commit -m "added test.txt"
rm test.txt
git add .
git commit -m "remove test.txt"

Once run, inspecting the commits, I get this:

$ git log --graph --decorate | sed 's/@/ /'
* commit 249e4893ea7458f45fe5cdc496ddc0292a3f03ef (HEAD -> master)
| Author: Chris Torek <chris.torek gmail.com>
| Date:   Thu May 5 20:28:02 2016 -0700
| 
|     remove test.txt
|  
* commit a132dc9e3939b5338f7c784c58da9c83f4902c8d (B)
| Author: Chris Torek <chris.torek gmail.com>
| Date:   Thu May 5 20:28:02 2016 -0700
| 
|     added test.txt
|  
* commit 81c4d9be82094fdb4c88ed0a53bdbd5c3dfd7a5a
  Author: Chris Torek <chris.torek gmail.com>
  Date:   Thu May 5 20:28:02 2016 -0700

      initial commit

Note that master's parent commit is branch B's commit, and there are just three commits, not four. How can this be, when the script runs four git commit commands?

Now let's add sleep 2 to the script, right after git checkout master, and re-run it and see what happens...

[edit]
$ sh testrebase.sh
[snip output]
$ cd testing_git && git log --oneline --decorate --graph --all
* cddbff1 (HEAD -> master) remove test.txt
* c4ac1b2 added test.txt
| * fefc150 (B) added test.txt
|/  
* 8c07bb6 initial commit

Whoa, now we have four commits, and a proper branch!

Why did the first script make three commits, and adding sleep 2 change it to make four commits?

The answer lies in the identity of a commit. Each commit has a (supposedly!) unique ID, which is a checksum of the contents of the commit. Here's what was in the B-branch commit, the first time around:

$ git cat-file -p B | sed 's/@/ /'
tree c3cd0188a6a1490204e25547986e49b0b445dec8
parent 81c4d9be82094fdb4c88ed0a53bdbd5c3dfd7a5a
author Chris Torek <chris.torek gmail.com> 1462505282 -0700
committer Chris Torek <chris.torek gmail.com> 1462505282 -0700

added test.txt

We have the tree, the parent, two (name, email, timestamp) triples for author and committer, a blank line, and the log message. The parent is the first commit on the master branch and the tree is the tree we made when we added test.txt (with its contents).

Then, when we went to make the second commit on the master branch, git made a new tree from the new files. This tree was bit-for-bit identical to the one we just committed on branch B, so it got the same unique ID (remember, there's only one copy of that tree in the repo, so this is correct behavior). Then it made a new commit object with my name and email and timestamps as usual, and the log message. But this commit was bit-for-bit identical to the commit we just made on branch B, so we got the same ID as before, and made branch master point to that commit.

In other words, we re-used the commit. We just made it on a different branch (so that master pointed to the same commit as B).

Adding sleep 2 changed the time stamp on the new commit. Now the two commits (in B and master) are no longer bit-for-bit identical:

$ git cat-file -p B | sed 's/@/ /' > bx
$ git cat-file -p master^ | sed 's/@/ /' > mx
$ diff bx mx
3,4c3,4
< author Chris Torek <chris.torek gmail.com> 1462505765 -0700
< committer Chris Torek <chris.torek gmail.com> 1462505765 -0700
---
> author Chris Torek <chris.torek gmail.com> 1462505767 -0700
> committer Chris Torek <chris.torek gmail.com> 1462505767 -0700

Different time stamps = different commits = much more sensible setup.

Actually executing the rebase, though, dropped the file anyway!

It turns out that this is by design. When you run git rebase, the setup code does not simply list every commit for cherry-picking, but instead uses git rev-list --right-only to find commits that it should drop.1

Since the commit that adds test.txt is in the upstream, Git just drops it entirely: the assumption here is that you sent it upstream to someone, they already took it, and there is no need to take it again.

Let's modify the reproducer script again—and we will be able to take out the sleep 2 this time, speeding things up—so that the change to master is different, and will not be removed from the list via --cherry-pick --right-only. We will still add test.txt with the same single line, but we will also modify main.txt in that commit:

# setup master
git checkout master
echo hello > test.txt
echo and also slight difference >> main.txt
git add .
git commit -m "added test.txt"

We can go ahead and turn on the final git checkout B and git rebase master lines as well, and this time, rebasing works as we originally expected:

$ git log --oneline --decorate --graph --all
* c31b13a (HEAD -> B) added test.txt
* da2ca52 (master) remove test.txt
* 6972019 added test.txt
* 0f0d2e8 initial commit
$ ls
main.txt   test.txt

I had not realized that rebase did this; it is not something I expected (even though as the other answer points out, it is documented), and it means that saying "rebase is just repeated cherry-pick" is not quite correct: it's repeated cherry-pick, with special cases of dropping commits.


1Actually, for non-interactive rebase, it uses this remarkable bit:

git format-patch -k --stdout --full-index --cherry-pick --right-only \
--src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
"$revisions" ${restrict_revision+^$restrict_revision} \
>"$GIT_DIR/rebased-patches"

where $revisions expands, in this case, to master...B.

The --cherry-pick --right-only options to git format-patch are not documented; one must know to look in the git rev-list documentation for them.

Interactive rebase uses a different technique but still selects away any commits that are already in the upstream. This shows up if you change rebase to rebase -i in that the rebase instructions consist of one noop line instead of the expected single pick line.

like image 98
torek Avatar answered Feb 02 '23 20:02

torek