Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git fetch non fast forward update

Tags:

git

I know that git fetch always does a fast forward merge between the branch and it's remote tracking after it fetches the commits from the remote.

My question deals with a scenario in which we will be requiring git fetch to do a non fast forward merge. Is it possible to make git fetch non fast forward merge ? If not , how will I solve this below scenario ?

My local repo (made some 2 local commits - the C and B commit)

...--o--o--A   <-- origin/master
            \
             C--B   <-- master 

After that I run git fetch (to update my branch)

...--o--o--A-- D  <-- origin/master (updated)
            \
             C--B   <-- master

Here , origin/master needs to be merged in master but this won't be fast forward. git fetch will fail. I don't want force fetch as I don't want to lose my commits C and B also.

How can I make git fetch to make a non fast forward merge. Something like this :

...--o--o--A-- D --  
            \      \
             \      F <-- master ,origin/master (updated) (my merge commit for non fast forward)
              \    /
               C--B   
like image 547
Number945 Avatar asked Dec 05 '22 11:12

Number945


1 Answers

(Note: I started writing this early this morning, and finished it late this evening; the question was answered in between, but having done all this work I'm still going to post the answer. :-) )

TL;DR

The git fetch command never merges anything. It can update references, and is very willing to update branch-like references in a fast-forward fashion. It is more reluctant to update such references in a non-fast-forward fashion; to do so, you must force the update.

Fast-forwarding—once properly divorced from the idea of merging—is a property of a change to a reference that refers to a commit. More specifically, we're usually interested in whether a branch name value, or a remote-tracking name value, changes in a fast-forward fashion. This means that we must look at the commit graph, because it's the new place in the commit graph, combined with the commit currently selected by the reference, that determines whether the update to that reference is a fast-forward.

Long

The original claim here is wrong, in at least one important way:

I know that git fetch always does a fast forward merge between the branch and it's remote tracking after it fetches the commits from the remote.

Let's take this apart a bit, so that we have the right words and phrases to use. We need to know:

  • what a reference is;
  • what a refspec is; and most importantly,
  • what it means to do a fast-forward update vs a non-fast-forward update to a reference.

This last bit also involves the force flag: each reference update can be forced, or non-forced. You may be familiar with git push --force, which sets the force flag for every reference that Git is pushing. The git fetch command has the same flag, with the same effect—but in general "all or nothing" is too broad, so Git has a way to set the force flag on a more individual basis. (The git push command has even more refinements here, but we'll only mention them in passing.)

Definition of reference and refspec

A reference, in Git, is just a name—ideally, a name that makes sense to a human—for some specific commit or other Git object.1 References always2 start with refs/ and mostly go on to have a second slash-delimited component that declares which kind of reference they are, e.g., refs/heads/ is a branch name, refs/tags/ is a tag name, and refs/remotes/ is a remote-tracking name.3

(The references we care about here, for deciding whether some update is or is not a fast-forward, are those that we would like to behave in a "branch-y manner": those in refs/heads/ and those in refs/remotes/. The rules we will discuss in a moment could be applied to any reference, but are definitely applied to these "branch-y" references.)

If you use an unqualified name like master where Git either requires or could use a reference, Git will figure out the full reference, using the six-step procedure outlined near the beginning of the gitrevisions documentation to resolve the abbreviated name to a full name.4

A refspec, in Git, is mostly a pair of references separated by a colon (:) character, with an optional leading plus sign +. The reference on the left side is the source, and the reference on the right is the destination. We use refspecs with git fetch and git push, which connect two different Git repositories. The source reference is meant for the use of whichever Git is sending commits and other Git objects, and the destination is meant for the use of the receiving Git. For git fetch in particular, then, the source is the other Git, and the destination is ourselves.

If a reference in a refspec is not fully qualified (does not start with refs/), Git can use the process above to qualify it. If both references in a single refspec are unqualified, Git has some code in it to attempt to put them both into an appropriate name-space, but I've never trusted this code very much. It's not clear to me, for instance, who really qualifies the source and destination during a fetch: there are two Gits involved, but the other Git usually sends us a complete list of all of their references, so our Git could do the resolving using this list. It's obviously wiser to use fully-qualified references here, though, in case their set of references does not match your own expectations: if they have only a refs/tags/xyz and you were expecting xyz to expand to refs/heads/xyz, you can be surprised when it doesn't.

In any refspec, you can omit either the source or the destination part. To omit the destination, you write the refspec without a colon, e.g., refs/heads/br. To omit the source, you write the refspec with a colon, but with nothing where the source part would go, e.g., :refs/heads/br. What it means when you do these things varies: git fetch treats them very differently from git push. For now, just note that there are the source and destination parts, with the option of omitting them.

The leading plus, if you choose to use it, always goes at the front. Hence git push origin +:refs/heads/br is a push with the force flag set, of an empty source, to the destination refs/heads/br, which is fully-qualified. Since this is a push, the source represents our Git's name (none) and the destination represents their Git's name (a branch named br). The similar-looking string +refs/heads/br has the force flag set, has a fully-qualified source, and has no destination. If we were concerned with git push we could look at the meanings of these two refspecs for push, but let's move on now.


1Any branch-like reference must point to a commit. Tag names may point to any object. Other reference names may have other constraints.

2There's some internal disagreement within Git itself whether every reference must be spelled, in its full-name form, as something matching refs/*. If that were the case, HEAD would never be a reference. In fact, the special names like HEAD and ORIG_HEAD and MERGE_HEAD sometimes act like normal references, and sometimes don't. For myself, I mostly exclude these from the concept of reference, except whenever it's convenient to include them. Each Git command makes up its little Gitty mind about how and whether to update these *_HEAD names, so there's no formal systematic approach like there is—or mostly is, given the other weird special cases that crop up in some commands—for refs/ style references.

3There are more well-known sub-spaces: e.g., refs/replace is reserved for git replace. The idea here, though, is simple enough: refs/ is followed by another human-readable string that tells us what kind of reference this particular reference is. Depending on the kind, we might demand yet another sub-space, as is the case in refs/remotes/ where we next want to know: which remote?

4Some Git commands know, or assume, that an abbreviated reference must be a branch name or a tag name. For instance, git branch won't let you spell out refs/heads/ in some places: it just rudely shoves refs/heads/ in on its own, since it only works on branch-names. The six-step procedure is generally used when there's no clear must be a branch name or must be a tag name rule.


The commit graph

Before we can define what it means to do a fast-forward update, we need to look at the commit graph. Fast-forward vs non-fast-forward only makes sense in the context of commits and the commit graph. As a result, it only makes sense for references that refer specifically to commits. Branch-like names—those in refs/heads/ and those in refs/remotes/—do always point to commits, and those are the ones we care about here.

Commits are uniquely identified by their hash ID.5 Every commit also stores some set of parent commit hash IDs. Most commits store a single parent ID; we say that such a commit points to its parent commit. These pointers make up a backwards-looking chain, from most-recent commit to oldest:

A  <-B  <-C

for instance, in a tiny repository with just three commits. Commit C has commit B as its immediate parent, so C points to B. Commit B has commit A as its immediate parent, so B points to A. A is the very first commit made, so it has no parent: it is a root commit and it points nowhere.

These pointers form an ancestor / descendant relationship. We know that these pointers always look backwards, so we don't need to draw the internal arrows. We do need something to identify the tip commit of the data structure, though, so that Git can find the ends of these chains:

o--o--C--o--o--o--G   <-- master
       \
        o--o--J   <-- develop

Here master points to some commit G, and develop points to J. Following J backwards, or G backwards, eventually leads to commit C. Commit C is therefore an ancestor of commits G and J.

Note that G and J have no parent/child relationship with each other! Neither is a descendant of the other, and neither is a parent of the other; they merely have some common ancestor once we go far enough back in time / history.


5In fact, every Git object is uniquely identified by its hash ID. This is, for instance, how Git stores only one copy of some file's contents even when that particular version of that one file gets stored in dozens or thousands of commits: commits that don't change the file's contents can re-use the existing blob object.


Definition of fast-forward

Fast-forward-ness is a property of moving a label. We can move the existing names (master and develop) around, but let's avoid doing so for a moment. Suppose, instead, we add a new name, and make it point to commit C. Let's add one-letter hash IDs for the rest of the commits as well:

        ............ <-- temp
       .
A--B--C--D--E--F--G   <-- master
       \
        H--I--J   <-- develop

We can now ask Git to move the new name from commit C to any other commit.

When we do so, we can ask another question about this move. Specifically, temp currently points to commit C. We pick another ID out of the A-through-J universe of possible commits and tell Git to move temp so that it points to this newly selected commit. Our question is simple: Is the new commit a descendant of the commit to which the label points right now?

If this label-move results in the name temp pointing to a commit that is a descendant of C, this move is a fast-forward. If not—if we pick commit B or A—this move is not a fast-forward.

That's it—that's all a fast-forward is. It's the answer to the question of whether this update to this label, that we are about to do right now, results in the label moving forward along some chain of our backwards-pointing commits.

The reason this is particularly interesting for branch names—names in the refs/heads/ space—is that git commit creates a new commit whose parent is the current commit, and adds this new commit to the graph—and then updates the current branch name to point to the new commit it just made. A repeated series of git commit operations therefore results in a one-step-at-a-time forward motion of the branch label. For instance, if we check out develop and make two new commits, we get:

A--B--C--D--E--F--G   <-- master
       \
        H--I--J--K--L   <-- develop

with the name develop now pointing to the second of these new commits.

If, while fiddling with temp, we made our branch-name temp point to commit J, we could now fast-forward temp to point to commit L. Because L points back to K which points back to J, all Git operations that follow these chains will treat commit K as still being "on" branch temp. So fast-forwarding is interesting because it means we don't "lose" commits.

On the other hand, if we made temp point instead to E, moving temp now to point to K will "lose" commits D and E from branch temp. Those commits are still safely on master, so they are still protected here. If they weren't on master any more for some reason—for instance, if we did something odd or unusual to master such as deleting the branch name—then commits D and E would be protected via the name temp up until the point we yank temp around in a non-fast-forward fashion. If temp is the only name protecting those commits from the garbage collector, they become vulnerable.

Comparing fast-forwarding to what to merge means, as a verb

Git does have something it calls a fast-forward merge. I dislike the phrase "fast-forward merge" since it's not really a merge at all—it's much more like just running git checkout, except for the fact that a branch name moves. But the git merge documentation uses the phrase, after more formally saying that some merge resolves as a fast-forward, so we have to be able to interpret it.

A fast-forward merge in Git results from running a git merge other where other is a commit that is strictly ahead of (i.e., is a descendant of) the current or HEAD commit in the graph. This means that the branch name to which HEAD is attached can be moved in a fast-forward fashion. For instance, with branch name temp pointing to commit C, we could run:

git checkout temp
git merge <hash-of-commit-E>

Git will realize that moving the label temp from commit C to commit E is a fast-forward operation on that label. The primary thing that allows us to use the verb merge here is the fact that we just used git merge to achieve it: the git merge command therefore updates our index and work-tree as well as doing the fast-forward operation.

But this is just git merge borrowing the concept of fast-forwarding. Fast-forwarding is not itself a "merge-y" concept. If you run a different git merge other where other is not a descendant of the current commit, but is a descendant of some common ancestor of the current commit—i.e., of a merge-base—then, in this case, git merge performs a true merge, using your index and work-tree as areas in which to do the merging. That is a merge, an operation that really fills the shoes of the verb phrase to merge.

(We have no such commit in our graph—we'd have to make a child of A or B, after which commit A or commit B would be the merge base.)

Neither git fetch nor git push ever merge

As we just noted, a real merge requires—at least potentially—the use of the index and work-tree. The git fetch command does not touch the index and work-tree. A git push is often done to a --bare repository, which does not even have a work-tree!

A git fetch or git push operation can do fast-forwarding. Since fast-forwarding isn't merging, this doesn't contradict our "never merge" claim. A git fetch or git push operation can also do non-fast-forward operations on reference names, including branch names, but to do so, the force flag must be enabled on that particular operation.

(The git push command offers not just "plain" and "force" but also "force-with-lease", which is analogous to a compare-and-swap or CAS instruction in multithreaded programming. The fetch command does not have this CAS option, it has only plain-or-forced.)

How git fetch uses refspecs to update references

The git fetch command has (at least, depending on how you count) two parts:

  • transfer commits (and other Git objects) from another Git into our Git, augmenting our commit graph;
  • optionally, update some references in our repository.

It has the side effect of writing everything it knows about new commits into .git/FETCH_HEAD, which is a special file that definitely is not a reference—there's never any ambiguity about this, unlike HEAD—but does contain hash IDs (plus extra information about what our Git saw from the other Git). The rest of Git can use the data left in this file, even if git fetch does not update any references.

Now, remember that a refspec can list a both source reference and a destination reference, or just a source, or just a destination. It can also have a leading + sign to indicate "force if necessary".

Looking specifically at git fetch, then, when dealing with what is to happen in the second half, we have these three possible cases:

  • refspec with both source and destination: use the source to locate a name in the other Git repository; use the destination to choose a name to update in our own repository.
  • refspec with source but no destination: use the source to locate a name in the other Git repository, but don't update any local name (but see below).
  • refspec with destination but no source: error.

In very old versions of Git—those before Git version 1.8.4—a git fetch operation simply obeys whatever refspecs you give it on the command line. If you give it no refspecs, it uses and obeys the remote.remote.fetch directives in the configuration. That is, in these old versions of Git, running git fetch origin xyz fetches whatever reference xyz matches, and since there is no destination, this updates no reference in our own repository! (The command still writes information to .git/FETCH_HEAD, as it always does.) Note that xyz might be a tag: the other Git might find refs/tags/xyz and not refs/heads/xyz. We did not specify; if we want to be sure to fetch a branch we need to specify refs/heads/.

If your Git is at least version 1.8.4, though, when git fetch brings over a branch name, Git does an opportunistic update using your remote.remote.fetch fetch settings. So, assuming the normal remote.origin.fetch setting, git fetch origin refs/heads/xyz:

  • updates nothing, because of the empty destination part;
  • but then updates refs/remotes/origin/xyz, because of the fetch setting.

Once git fetch gets around to doing all of its updates, each update:

  • can succeed because the rules for this kind of reference allow the update, or
  • can fail because the rules don't allow it and the force flag is not set; or
  • can succeed because even though the rules don't allow it, the force flag is set.

Suppose, then, that we run:

git fetch origin refs/heads/xyz:refs/heads/abc

and that there is a refs/heads/xyz on the other Git at origin. Suppose further that our Git is at least 1.8.4 and we have the usual refspec in remote.origin.fetch. Then our Git:

  1. Brings over the commits that go with their Git's refs/heads/xyz if necessary.
  2. Attempts to update our refs/heads/abc. This update is not forced. This update is due to what we told our Git on the command line.
  3. Attempts to update our refs/remotes/origin/xyz. This update is forced. This update is due to what we told our Git through remote.origin.fetch.

Since both refs/heads/ and refs/remotes/ are branch style name-spaces, our Git—which we know is at least 1.8.4—follows the branch update rules here.6 These tell Git that an update is automatically allowed if it's a fast-forward.

For item 2 here, the name to be updated is refs/heads/abc (because that's on the right side of the refspec on the command line). Again, fast-forward here has nothing to do with merging: Git just checks whether the current value of refs/heads/abc is an ancestor of the proposed new value of refs/heads/abc. If so, this update is allowed. If not, it's not.

For item 3, the name to be updated is refs/remotes/origin/xyz (because the name matched on the left was refs/heads/xyz and the default refspec reads +refs/heads/*:refs/remotes/origin/*). This refspec has the force flag set, so the update to refs/remotes/origin/xyz will happen. It will be a normal, fast-forward, non-forced update if the change is a fast-forward. It will be a non-fast-forward forced update if the change is a non-fast-forward.


6In Git 1.8.2 and earlier, Git accidentally applies the branch update "must be a fast forward operation" rules to tag names as well. In Git 1.8.4, this was fixed. However, a new bug was introduced at some point. The code inside Git to update references during git fetch is horrible and twisty and I think probably should be thrown away and re-coded from scratch, but actually doing that is a nightmare of its own.


There's one more special constraint in git fetch

We noted in passing above that the special name HEAD, which probably is not a reference, is usually attached to some branch name. When your HEAD is attached to some branch, that branch is your current branch. That's the internal definition of what it means to have that branch as your current branch: the branch's name has to be inside the .git/HEAD file.

By default, git fetch refuses to update this branch name. That is, if HEAD is attached to master, git fetch simply won't update refs/heads/master. Running git fetch origin refs/heads/master:refs/heads/master will fail to update your refs/heads/master. After you git checkout some other branch, attaching HEAD to develop for instance, then git fetch is willing to update master, and now you can run git fetch origin master:master (assuming you prefer the shorter, slightly riskier, unqualified spelling) if you like.7

The reason for this special constraint has to do with the difference we noted above about how git merge does a merge that resolves in a fast-forward: git merge updates the index and work-tree, as if you ran git checkout. The git fetch command never updates the index and work-tree. If git fetch allowed you to fast-forward your master to a new commit, your index and work-tree could get out of whack.

The problem here is that your index and work-tree are intended to match your current commit, except for any work that you have done since you ran git checkout to change your index and work-tree. If git fetch updates the refs/heads/ space branch-name to which your HEAD is attached, your index and work-tree no longer match your current commit, because your current commit is the one whose hash ID is stored in that branch-name. (If you do manage to get into this state, it's annoying to fix, though it is possible. See Why does Git allow pushing to a checked-out branch in an added worktree? How shall I recover?)

The git fetch command has a flag, --update-head-ok, that specifically overrides this check. You should not use it. The git pull code does use it, because git pull immediately runs a second Git command that will correct the index and work-tree even in these special cases. Moreover, git pull does some pre-fetch checks to make sure that that second command won't wreck everything. Unless you know exactly what you are doing, though, you should not use it.


7If you do do this, you're just making extra mental work for yourself, in general. I recommend not doing this as everyday practice. Instead, use git fetch origin && git checkout master && git merge --ff-only. I defined an alias, git mff, that runs git merge --ff-only, that I use to do these things.

like image 109
torek Avatar answered Dec 27 '22 09:12

torek