According to my understanding of <code>git pull --rebase origin master</code>, it should be the equivalent of running the following commands: <pre class="prettyprint"><code>(from branch master): $ git fetch origin (from branch master): $ git rebase origin/master </code></pre> I seem to have found some case where this doesn't work as expected. In my workspace, I have the following setup: <ul> <li>branch <code>origin/master</code> references branch <code>master</code> on remote <code>origin</code> </li> <li>branch <code>master</code> is set up to track <code>origin/master</code>, and is behind master by several commits.</li> <li>branch <code>feature</code> is set up to track local branch <code>master</code>, and ahead of <code>master</code> by several commits.</li> </ul> Sometimes, I will lose commits by running the following sequence of steps <pre class="prettyprint"><code>(from branch master): $ git pull --rebase (from branch master): $ git checkout feature (from branch feature): $ git pull --rebase </code></pre> At this point, the few commits ahead I was on <code>feature</code> have now been lost. Now, if I reset my position, and instead do the following: <pre class="prettyprint"><code>(from branch feature): $ git reset --hard HEAD@{2} # rewind to before second git pull (from branch feature): $ git rebase master </code></pre> The commits have been applied correctly and my new commits on <code>feature</code> are still present. This seems to directly contradict my understanding of how <code>git pull</code> works, unless <code>git fetch .</code> does something stranger than I expected. Unfortunately, this is not 100% reproducible for all commits. When it does work for a commit, though, it works every time. Note: My <code>git pull --rebase</code> here should actually be read as a <code>--rebase=preserve</code>, if that matters. I have the following in my <code>~/.gitconfig</code>: <pre class="prettyprint"><code>[pull] rebase = preserve </code></pre>

(Edit, 30 Nov 2016: see also this answer to Why is git rebase discarding my commits?. It is now virtually certain that it is due to the fork-point option.) There are a few differences between manual and <code>pull</code>-based <code>git rebase</code> (fewer now in 2.7 than there were in versions of git predating the <code>--fork-point</code> option in <code>git merge-base</code>). And, I suspect your automatic preserve-merges may be involved. It's a bit hard to be sure but the fact that your local branch follows your other local branch which is getting rebased is quite suggestive. Meanwhile, the old <code>git pull</code> script was also rewritten in C recently so it's harder to see what it does (though you can set environment variable <code>GIT_TRACE</code> to <code>1</code> to make git show you commands as it runs them internally). In any case, there are two or three key items here (depending on how you count and split these up, I'll make it into 3): <ul> <li><code>git pull</code> runs <code>git fetch</code>, then either <code>git merge</code> or <code>git rebase</code> per instructions, but when it runs <code>git rebase</code> it uses the new fork-point machinery to "recover from an upstream rebase".</li> <li>When <code>git rebase</code> is run with no arguments it has a special case that invokes the fork-point machinery. When run with arguments, the fork-point machinery is disabled unless explicitly requested with <code>--fork-point</code>.</li> <li>When <code>git rebase</code> is instructed to preserve merges, it uses the interactive rebase code (non-interactively). I'm not sure this actually matters here (hence "may be involved" above). Normally it flattens away merges and only the interactive rebase script has code to preserve them (this code actually re-does the merges since there's no other way to deal with them).</li> </ul> The most important item here (for sure) is the fork point code. This code uses the reflog to handle cases best shown by drawing part of the commit graph. In a normal (no fork point stuff needed) rebase case you have something like this: <pre class="prettyprint"><code>... - A - B - C - D - E <-- origin/foo \ I - J - K <-- foo </code></pre> where <code>A</code> and <code>B</code> are commits you had when you started your branch (so that <code>B</code> is the merge-base), <code>C</code> through <code>E</code> are new commits you picked up from the remote via <code>git fetch</code>, and <code>I</code> through <code>K</code> are your own commits. The rebase code copies <code>I</code> through <code>K</code>, attaching the first copy to <code>E</code>, the second to the-copy-of-<code>I</code>, and the third to the-copy-of-<code>J</code>. Git figures out—or used to, anyway—which commits to copy using <code>git rev-list origin/foo..foo</code>, i.e., using the name of your current branch (<code>foo</code>) to find <code>K</code> and work backwards, and the name of its upstream (<code>origin/foo</code>) to find <code>E</code> and work backwards. The backwards march stops at the merge base, in this case <code>B</code>, and the copied result looks like this: <pre class="prettyprint"><code>... - A - B - C - D - E <-- origin/foo \ \ \ I' - J' - K' <-- foo \ I - J - K [foo@{1}: reflog for foo] </code></pre> The problem with this method occurs when the upstream—<code>origin/foo</code> here—is itself rebased. Let's say, for instance, that on <code>origin</code> someone force-pushed so that <code>B</code> was replaced by a new copy <code>B'</code> with different commit wording (and maybe a different tree as well, but, we hope, nothing that affects our <code>I</code>-through-<code>K</code>). The starting point now looks like this: <pre class="prettyprint"><code> B' - C - D - E <-- origin/foo / ... - A - B <-- [origin/foo@{n}] \ I - J - K <-- foo </code></pre> Using <code>git rev-list origin/foo..foo</code>, we'd select commits <code>B</code>, <code>I</code>, <code>J</code>, and <code>K</code> to be copied, and try to paste them on after <code>E</code> as usual; but we don't want to copy <code>B</code> as it really came from <code>origin</code> and has been replaced with its own copy <code>B'</code>. What the fork point code does is look at the reflog for <code>origin</code> to see if <code>B</code> was reachable at some time. That is, it checks not just <code>origin/master</code> (finding <code>E</code> and scanning back to <code>B'</code> and then <code>A</code>), but also <code>origin/master@{1}</code> (pointing directly to <code>B</code>, probably, depending on how frequently you run <code>git fetch</code>), <code>origin/master@{2}</code>, and so on. Any commits on <code>foo</code> that are reachable from any <code>origin/master@{n}</code> are included for consideration in finding a Lowest Common Ancestor node in the graph (i.e., they're all treated as options to become the merge base that <code>git merge-base</code> prints out). (It's worth noting a defect of sorts here: this automated fork point detection can only find commits that were reachable for the time that the reflog entry is maintained, which in this case defaults to 30 days. However, that's not particularly relevant to your issue.) <hr> In your case, you have three branch names (and hence three reflogs) involved: <ul> <li> <code>origin/master</code>, which is updated by <code>git fetch</code> (the first step of your <code>git pull</code> while branch <code>master</code>)</li> <li> <code>master</code>, which is updated by both you (via normal commits) and <code>git rebase</code> (the second step of your <code>git pull</code>), and</li> <li> <code>feature</code>, which is updated by both you (via normal commits) and <code>git rebase</code> (the second step of your second <code>git pull</code>: you "fetch" from yourself, a no-op, then rebase <code>feature</code> on <code>master</code>).</li> </ul> Both rebases are run with <code>--preserve-merges</code> (hence non-interacting interactive mode) and <code>--onto new-tip fork-point</code>, where the <code>fork-point</code> commit ID is found by running <code>git merge-base --fork-point upstream-name HEAD</code>. The <code>upstream-name</code> for the first rebase is <code>origin/master</code> (well, <code>refs/remotes/origin/master</code>) and the <code>upstream-name</code> for the second rebase is <code>master</code> (<code>refs/heads/master</code>). This should all Just Work. If your commit graph at the start of the whole process is something like what you've described: <pre class="prettyprint"><code>... - A - B <-- master, origin/master \ I - J - K <-- feature </code></pre> then the first <code>fetch</code> brings in some commits and makes <code>origin/master</code> point to the new tip: <pre class="prettyprint"><code> C - D - E <-- origin/master / ... - A - B <-- master, origin/master@{1} \ I - J - K <-- feature </code></pre> and the first rebase then finds nothing to copy (the merge-base of <code>master</code> and <code>B</code>—<code>B</code>=fork-point(master, origin/master)—is just <code>B</code> so there is nothing to copy), giving: <pre class="prettyprint"><code> C - D - E <-- master, origin/master / ... - A - B <-- master@{1}, origin/master@{1} \ I - J - K <-- feature </code></pre> The second fetch is from yourself and a no-op/skipped entirely, leaving this as the input to the second rebase. The <code>--onto</code> target is <code>master</code> which is commit <code>E</code> and the fork-point of <code>HEAD</code> (<code>feature</code>) and <code>master</code> is also commit <code>B</code>, leaving commits <code>I</code> through <code>K</code> to copy after <code>E</code> as usual. If some commit(s) are being dropped, something is going wrong in this process, but I can't see what.

Understanding "git pull --rebase" vs "git rebase"

Tags:

git

version-control

rebase

According to my understanding of git pull --rebase origin master, it should be the equivalent of running the following commands:

(from branch master):  $ git fetch origin
(from branch master):  $ git rebase origin/master

I seem to have found some case where this doesn't work as expected. In my workspace, I have the following setup:

branch origin/master references branch master on remote origin
branch master is set up to track origin/master, and is behind master by several commits.
branch feature is set up to track local branch master, and ahead of master by several commits.

Sometimes, I will lose commits by running the following sequence of steps

(from branch master):  $ git pull --rebase
(from branch master):  $ git checkout feature
(from branch feature): $ git pull --rebase

At this point, the few commits ahead I was on feature have now been lost. Now, if I reset my position, and instead do the following:

(from branch feature): $ git reset --hard HEAD@{2} # rewind to before second git pull
(from branch feature): $ git rebase master

The commits have been applied correctly and my new commits on feature are still present. This seems to directly contradict my understanding of how git pull works, unless git fetch . does something stranger than I expected.

Unfortunately, this is not 100% reproducible for all commits. When it does work for a commit, though, it works every time.

Note: My git pull --rebase here should actually be read as a --rebase=preserve, if that matters. I have the following in my ~/.gitconfig:

[pull]
    rebase = preserve

941

asked Feb 10 '16 16:02

ashays

1 Answers

(Edit, 30 Nov 2016: see also this answer to Why is git rebase discarding my commits?. It is now virtually certain that it is due to the fork-point option.)

There are a few differences between manual and pull-based git rebase (fewer now in 2.7 than there were in versions of git predating the --fork-point option in git merge-base). And, I suspect your automatic preserve-merges may be involved. It's a bit hard to be sure but the fact that your local branch follows your other local branch which is getting rebased is quite suggestive. Meanwhile, the old git pull script was also rewritten in C recently so it's harder to see what it does (though you can set environment variable GIT_TRACE to 1 to make git show you commands as it runs them internally).

In any case, there are two or three key items here (depending on how you count and split these up, I'll make it into 3):

git pull runs git fetch, then either git merge or git rebase per instructions, but when it runs git rebase it uses the new fork-point machinery to "recover from an upstream rebase".
When git rebase is run with no arguments it has a special case that invokes the fork-point machinery. When run with arguments, the fork-point machinery is disabled unless explicitly requested with --fork-point.
When git rebase is instructed to preserve merges, it uses the interactive rebase code (non-interactively). I'm not sure this actually matters here (hence "may be involved" above). Normally it flattens away merges and only the interactive rebase script has code to preserve them (this code actually re-does the merges since there's no other way to deal with them).

The most important item here (for sure) is the fork point code. This code uses the reflog to handle cases best shown by drawing part of the commit graph.

In a normal (no fork point stuff needed) rebase case you have something like this:

... - A - B - C - D - E   <-- origin/foo
            \
              I - J - K   <-- foo

where A and B are commits you had when you started your branch (so that B is the merge-base), C through E are new commits you picked up from the remote via git fetch, and I through K are your own commits. The rebase code copies I through K, attaching the first copy to E, the second to the-copy-of-I, and the third to the-copy-of-J.

Git figures out—or used to, anyway—which commits to copy using git rev-list origin/foo..foo, i.e., using the name of your current branch (foo) to find K and work backwards, and the name of its upstream (origin/foo) to find E and work backwards. The backwards march stops at the merge base, in this case B, and the copied result looks like this:

... - A - B - C - D - E   <-- origin/foo
           \            \
            \             I' - J' - K'   <-- foo
             \
              I - J - K   [foo@{1}: reflog for foo]

The problem with this method occurs when the upstream—origin/foo here—is itself rebased. Let's say, for instance, that on origin someone force-pushed so that B was replaced by a new copy B' with different commit wording (and maybe a different tree as well, but, we hope, nothing that affects our I-through-K). The starting point now looks like this:

          B' - C - D - E    <-- origin/foo
        /
... - A - B   <-- [origin/foo@{n}]
            \
              I - J - K   <-- foo

Using git rev-list origin/foo..foo, we'd select commits B, I, J, and K to be copied, and try to paste them on after E as usual; but we don't want to copy B as it really came from origin and has been replaced with its own copy B'.

What the fork point code does is look at the reflog for origin to see if B was reachable at some time. That is, it checks not just origin/master (finding E and scanning back to B' and then A), but also origin/master@{1} (pointing directly to B, probably, depending on how frequently you run git fetch), origin/master@{2}, and so on. Any commits on foo that are reachable from any origin/master@{n} are included for consideration in finding a Lowest Common Ancestor node in the graph (i.e., they're all treated as options to become the merge base that git merge-base prints out).

(It's worth noting a defect of sorts here: this automated fork point detection can only find commits that were reachable for the time that the reflog entry is maintained, which in this case defaults to 30 days. However, that's not particularly relevant to your issue.)

In your case, you have three branch names (and hence three reflogs) involved:

origin/master, which is updated by git fetch (the first step of your git pull while branch master)
master, which is updated by both you (via normal commits) and git rebase (the second step of your git pull), and
feature, which is updated by both you (via normal commits) and git rebase (the second step of your second git pull: you "fetch" from yourself, a no-op, then rebase feature on master).

Both rebases are run with --preserve-merges (hence non-interacting interactive mode) and --onto new-tip fork-point, where the fork-point commit ID is found by running git merge-base --fork-point upstream-name HEAD. The upstream-name for the first rebase is origin/master (well, refs/remotes/origin/master) and the upstream-name for the second rebase is master (refs/heads/master).

This should all Just Work. If your commit graph at the start of the whole process is something like what you've described:

... - A - B   <-- master, origin/master
            \
              I - J - K   <-- feature

then the first fetch brings in some commits and makes origin/master point to the new tip:

              C - D - E   <-- origin/master
            /
... - A - B   <-- master, origin/master@{1}
            \
              I - J - K   <-- feature

and the first rebase then finds nothing to copy (the merge-base of master and B—B=fork-point(master, origin/master)—is just B so there is nothing to copy), giving:

              C - D - E   <-- master, origin/master
            /
... - A - B   <-- master@{1}, origin/master@{1}
            \
              I - J - K   <-- feature

The second fetch is from yourself and a no-op/skipped entirely, leaving this as the input to the second rebase. The --onto target is master which is commit E and the fork-point of HEAD (feature) and master is also commit B, leaving commits I through K to copy after E as usual.

If some commit(s) are being dropped, something is going wrong in this process, but I can't see what.

159

answered Oct 15 '22 18:10

torek

Related questions
                            
                                why git-svn failed with signal 13?
                            
                                How to use git in R package development?
                            
                                How to find the 100 largest GitHub repositories for a past date?
                            
                                Line endings with cygwin and Github for Windows
                            
                                JGIT Pull NoHeadException
                            
                                Teamcity REST API get latest successful build on a branch
                            
                                Unable to finish a Git Rebase
                            
                                How to change HEAD of git submodule
                            
                                Command to create ssh config file using git bash(windows)?
                            
                                How to push a local repository to remote use "SourceTree"
                            
                                Creating branch off of remote master
                            
                                git rev-parse --verify says "fatal: Needed a single revision"
                            
                                Use custom NuGet feed in Visual Studio Online
                            
                                Why am I committing a bunch of desktop.ini files despite .gitignore?
                            
                                Git thinks I'm a different user, won't give me access to github repo
                            
                                Change timezone for all commits in git history
                            
                                Why did git push origin master draw ASCII art? [duplicate]
                            
                                Git for Windows doesn't know %USERPROFILE%
                            
                                Is there a keyboard shortcut for stage lines in git gui?
                            
                                How can a commit in git submodule trigger a build in continuous integration？

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With