I'm trying to figure out why git rebase causes a newly created file to be deleted if the branch I'm rebasing off of deleted it. For example:
A1 - A2 - A3
\
B1
A2 = add a new file test.txt
A3 = delete test.txt
B1 = add the exact same file as A2
If B1 is checked out and I execute git rebase A3
, test.txt is still deleted. I'd expect the result to be:
A1 - A2 - A3 - B1
Which would mean that test.txt still exists. Why is test.txt deleted after the rebase?
Wow, this was a tough one! :-)
Using your script, I reproduced the problem. There was something very odd about it all though, so first, I trimmed out the rebase step, leaving this (slightly modified) script:
#!/bin/sh
set -e
if [ -d testing_git ]; then
echo test dir testing_git already exists - halting
exit 1
fi
mkdir testing_git
cd testing_git
git init
touch main.txt
git add .
git commit -m "initial commit"
# setup B branch
git checkout -b B
echo hello > test.txt
git add .
git commit -m "added test.txt"
# setup master
git checkout master
echo hello > test.txt
git add .
git commit -m "added test.txt"
rm test.txt
git add .
git commit -m "remove test.txt"
Once run, inspecting the commits, I get this:
$ git log --graph --decorate | sed 's/@/ /'
* commit 249e4893ea7458f45fe5cdc496ddc0292a3f03ef (HEAD -> master)
| Author: Chris Torek <chris.torek gmail.com>
| Date: Thu May 5 20:28:02 2016 -0700
|
| remove test.txt
|
* commit a132dc9e3939b5338f7c784c58da9c83f4902c8d (B)
| Author: Chris Torek <chris.torek gmail.com>
| Date: Thu May 5 20:28:02 2016 -0700
|
| added test.txt
|
* commit 81c4d9be82094fdb4c88ed0a53bdbd5c3dfd7a5a
Author: Chris Torek <chris.torek gmail.com>
Date: Thu May 5 20:28:02 2016 -0700
initial commit
Note that master
's parent commit is branch B
's commit, and there are just three commits, not four. How can this be, when the script runs four git commit
commands?
Now let's add sleep 2
to the script, right after git checkout master
, and re-run it and see what happens...
[edit]
$ sh testrebase.sh
[snip output]
$ cd testing_git && git log --oneline --decorate --graph --all
* cddbff1 (HEAD -> master) remove test.txt
* c4ac1b2 added test.txt
| * fefc150 (B) added test.txt
|/
* 8c07bb6 initial commit
Whoa, now we have four commits, and a proper branch!
Why did the first script make three commits, and adding sleep 2
change it to make four commits?
The answer lies in the identity of a commit. Each commit has a (supposedly!) unique ID, which is a checksum of the contents of the commit. Here's what was in the B
-branch commit, the first time around:
$ git cat-file -p B | sed 's/@/ /'
tree c3cd0188a6a1490204e25547986e49b0b445dec8
parent 81c4d9be82094fdb4c88ed0a53bdbd5c3dfd7a5a
author Chris Torek <chris.torek gmail.com> 1462505282 -0700
committer Chris Torek <chris.torek gmail.com> 1462505282 -0700
added test.txt
We have the tree
, the parent
, two (name, email, timestamp) triples for author and committer, a blank line, and the log message. The parent is the first commit on the master branch and the tree is the tree we made when we added test.txt
(with its contents).
Then, when we went to make the second commit on the master
branch, git made a new tree from the new files. This tree was bit-for-bit identical to the one we just committed on branch B
, so it got the same unique ID (remember, there's only one copy of that tree in the repo, so this is correct behavior). Then it made a new commit object with my name and email and timestamps as usual, and the log message. But this commit was bit-for-bit identical to the commit we just made on branch B
, so we got the same ID as before, and made branch master
point to that commit.
In other words, we re-used the commit. We just made it on a different branch (so that master
pointed to the same commit as B
).
Adding sleep 2
changed the time stamp on the new commit. Now the two commits (in B
and master
) are no longer bit-for-bit identical:
$ git cat-file -p B | sed 's/@/ /' > bx
$ git cat-file -p master^ | sed 's/@/ /' > mx
$ diff bx mx
3,4c3,4
< author Chris Torek <chris.torek gmail.com> 1462505765 -0700
< committer Chris Torek <chris.torek gmail.com> 1462505765 -0700
---
> author Chris Torek <chris.torek gmail.com> 1462505767 -0700
> committer Chris Torek <chris.torek gmail.com> 1462505767 -0700
Different time stamps = different commits = much more sensible setup.
Actually executing the rebase, though, dropped the file anyway!
It turns out that this is by design. When you run git rebase
, the setup code does not simply list every commit for cherry-picking, but instead uses git rev-list --right-only
to find commits that it should drop.1
Since the commit that adds test.txt
is in the upstream, Git just drops it entirely: the assumption here is that you sent it upstream to someone, they already took it, and there is no need to take it again.
Let's modify the reproducer script again—and we will be able to take out the sleep 2
this time, speeding things up—so that the change to master
is different, and will not be removed from the list via --cherry-pick --right-only
. We will still add test.txt
with the same single line, but we will also modify main.txt
in that commit:
# setup master
git checkout master
echo hello > test.txt
echo and also slight difference >> main.txt
git add .
git commit -m "added test.txt"
We can go ahead and turn on the final git checkout B
and git rebase master
lines as well, and this time, rebasing works as we originally expected:
$ git log --oneline --decorate --graph --all
* c31b13a (HEAD -> B) added test.txt
* da2ca52 (master) remove test.txt
* 6972019 added test.txt
* 0f0d2e8 initial commit
$ ls
main.txt test.txt
I had not realized that rebase did this; it is not something I expected (even though as the other answer points out, it is documented), and it means that saying "rebase is just repeated cherry-pick" is not quite correct: it's repeated cherry-pick, with special cases of dropping commits.
1Actually, for non-interactive rebase, it uses this remarkable bit:
git format-patch -k --stdout --full-index --cherry-pick --right-only \
--src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
"$revisions" ${restrict_revision+^$restrict_revision} \
>"$GIT_DIR/rebased-patches"
where $revisions
expands, in this case, to master...B
.
The --cherry-pick --right-only
options to git format-patch
are not documented; one must know to look in the git rev-list
documentation for them.
Interactive rebase uses a different technique but still selects away any commits that are already in the upstream. This shows up if you change rebase
to rebase -i
in that the rebase instructions consist of one noop
line instead of the expected single pick
line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With