Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

learning Git: tracking vs. setting upstream (-u) for remotes?

Tags:

git

github

I am learning Git and am attempting to understand the difference between "tracking" a remote and defining an "upstream" relationship with it (with the -u tag).

For master to origin/master, I have been using

git push origin master

which seems to automatically define a relationship (although I am not sure what it is).

For branches, I have been using

git branch newbranch
git push -u origin newbranch

I know this sets an upstream relationship, but again I don't understand the distinction.

Can someone explain the difference?

like image 576
eabates Avatar asked Nov 03 '15 15:11

eabates


People also ask

What does it mean to set the remote as upstream?

When you push to a remote and you use the --set-upstream flag git sets the branch you are pushing to as the remote tracking branch of the branch you are pushing. Adding a remote tracking branch means that git then knows what you want to do when you git fetch , git pull or git push in future.

What is upstream remote git?

In the git world, upstream refers to the original repo or a branch. For example, when you clone from Github, the remote Github repo is upstream for the cloned local copy.

What is git remote tracking?

Remote-tracking branches are references to the state of remote branches. They're local references that you can't move; Git moves them for you whenever you do any network communication, to make sure they accurately represent the state of the remote repository.


1 Answers

Both answers here are correct, but I'll describe the underlying mechanism, because until I found out what it was, I found the whole notion of "tracking" quite mysterious.

Git breaks up this "tracking" information into two parts: the name of the remote—usually the word origin, like you're using—and then the name that git commands on that remote use to name the branch.1 In other words, if you have login access to the remote, and you log in over there and go into the repository, you might run git log master to see what has been committed.

If you peek into your .git/config file, you will see, for each local branch that's "tracking" something, these two parts. For instance, let's say you had a local branch named experiment, that you have set up to track origin/master. This would result in:

[branch "experiment"]
    remote = origin
    merge = master

But there is one more part to this tracking-branch stuff: when you run git fetch origin, and there's something new on branch master on origin, the fetch step updates your local origin/master. This name—with the remote-name origin first, then a slash /, then the branch name as it appears on the remote—is how you can see what's happened on the remote. After your git fetch is done, it copies the remote's branch-names (and their corresponding SHA-1s for their branch-tips) to your local repository, renaming them with the remote-name in front.

It's actually the git fetch step that updates origin/master and so on, and only once that's done, does this "tracking" stuff have any useful effect. Git can now tell you that you're ahead and/or behind by some number of commits. And, you can now run a command like git log origin/master to see what's happening there—or more interestingly, git log --oneline master..origin/master to see "their" commits that you don't have yet: essentially, "what fetch brought in"—and git log --oneline origin/master..master to see "your" commits that they don't have yet. (If you've already done a merge or rebase, it's too late to see what your fetch brought in, because now you have what they had then, as a result of your merge-or-rebase.)

The oddball in all of this is git pull. The git pull command is really just a short-cut that runs git fetch first, and then runs git merge (or, if you redirect it, git rebase). To do these steps separately, you run git fetch origin, then git merge origin/master or git rebase origin/master. For historical reasons,2git pull takes the remote's name for the branch, in this case master, rather than the name it winds up being renamed-to in your repository.


So, with that as background, let's look at some commands:

  • git fetch remote: This doesn't need any branch names at all. It calls up the given remote, asks it where all its branches are now, and updates your repository, recording all these updates underneath the origin/ names (so as not to affect any of your local branches). In other words, this updates the names your branches may (or may not) be tracking, but it doesn't need to know anything about what is or isn't tracking what.

  • git status: If it says you're "on branch X", and branch X is tracking origin/X, git status can also tell you if you have, on your X, commits that are not on origin/X, and vice versa.

  • git merge and git rebase: These need some way to know what to merge, or what to rebase onto. You can name it explicitly, but if you tell your git that your branch X is tracking origin/X, then whenever you're on branch X, git merge or git rebase will know what to do.

  • git branch --set-upstream-to origin/X: This is the main command that sets or changes what your current branch is tracking. In other words, if you're on branch X now, this updates branch.X.remote and branch.X.merge for you, so that you don't have to use two separate git config commands. You can also use git branch --unset-upstream to remove the tracking information.

  • git push: if you give it no additional information, it uses the current branch's "remote"—the first half of its tracking info—to decide which remote to call up. Whether or not you give git push a remote name, the next part depends on whether you give it a "refspec". If you don't, git push uses push.default to decide what refspec to use.

Wait, what's a refspec?

The second-simplest form of a refspec is just two branch names with a colon between them, like master:master. For git push, the name on the left is your branch name, and the name on the right is their—the other git's—branch name. If you omit the : you get the simplest form, where the remote-side name—the one that would follow the :—is chosen by a somewhat complicated process (described in the git push documentation), which actually depends on more configuration variables and whether you've set an upstream.

What about git push -u? That's just a convenient shortcut: in much the way that git branch --set-upstream-to is a shortcut for doing two git config commands, git push -u refspec is a shortcut for doing a push, and then doing a git branch --set-upstream-to as well. You must give push a refspec for this to do anything useful.

What if you give a "half refspec" like master? Well, as noted above, the name your git chooses to give to the remote's git is found by a complicated process, but if you haven't set an upstream yet (which is fairly likely if you're doing git push -u in the first place) it's going to be the same as your local name. So git push -u origin master probably "means" git push -u origin master:master, and that then means git branch --set-upstream-to origin/master, in the end.

If you give a fuller refspec, like git push -u origin experiment:feature, this will push your experiment branch to origin, asking origin to call it feature, and then do a --set-upstream-to origin/feature. Note that at this point, the upstream name of your local branch differs from the local name. Git is fine with this; just be sure you are, too. :-)

There are more clever tricks that git has:

  • If you run git checkout branch and branch does not yet exist, and there's a single "obvious" remote-tracking branch such as origin/branch, git will create a new, local branch that is already tracking origin/branch. (That is, the local branch will have its remote set to origin and its merge set to branch.)

  • If you run git branch local-name remote-tracking-name, git will automatically set up the local branch to track the corresponding remote-tracking branch. (You can configure git as to whether you want this, but that's the default.)

Summary: git fetch updates the things that tracking uses (the origin/* entries, for remote origin). Once that's done—including if it's done by using git pull, which runs git fetch3then you see more information from commands like git status; and commands like git rebase use it to know how to do the rebase, without your having to tell it anything more.

There's one more interesting twist: any branch's "upstream" can be in your own local repository. To get this, you set the remote for that branch to . (a literal dot), and the merge to the name of the branch. You don't have to know how to do this, because you can do git branch --set-upstream-to master, for instance, to make your current branch track your own master.

"Incoming" and "outgoing"

Mercurial users may wonder how you can get the effect of hg incoming or hg outgoing. The former tells you what your upstream has, that you don't. The latter tells you what you have, that they don't. As it turns out, this is easy to do in modern git, because git has a special syntax, @{u}, to find the current branch's upstream.

In other words, if you're on master and master tracks origin/master, @{u} (which you can spell out as @{upstream}) is just another way to write origin/master. So origin/master..master is just a longer way to write @{u}..master. And, if you're on master, HEAD also names master, and omitting a branch name tells git to use HEAD, so @{u}.. suffices.

As noted above, after you've run git fetch on the appropriate remote, you can use git log to find "what they have that you don't" and "what you have that they don't". You do have to run this git fetch step (and do not want a merge or rebase to occur at this point).

So:

git config --global alias.incoming '!git fetch && git log --oneline ..@{u}'
git config --global alias.outgoing '!git fetch && git log --oneline @{u}..'

(in some shells you may need a \ in front of the !, or other quoting tricks, and it may be easier to just insert the aliases with your editor by running git config --global --edit).

You can, of course, change the --oneline part to whatever options you prefer. (And I like to leave the git fetch step for me to manually run myself anyway, which simplifies the alias to just alias.incoming = log --oneline ..@{u}, for instance.4 This mainly just avoids constantly pestering the upstream.)


1If you keep your branch names the same as theirs, you don't get a chance to see this. But once you start using branches heavily, you'll probably wind up with several branches that all track the same upstream, and then it really matters.

2git pull actually predates remotes and remote-tracking branches. It still has all kinds of weirdness because of this.

3If your git version is older than 1.8.4, when git pull runs git fetch, the fetch step doesn't update remote-tracking branches. This was intended as a feature, but it was a bad feature, and newer git versions update. This does mean, though, that if you have an old git, you should be wary of using the pull script: it's an inconvenient convenience.

4Fixed in an edit: I accidentally wrote alias.incoming = git log .... Git aliases are assumed to be other git commands (like log), so you want to leave out the git part, unless the whole alias starts with an exclamation point !, in which case the whole alias is passed to the shell to run. I actually forget, now, how aliases worked back when all the commands were spelled like git-log, git-branch, git-fetch, and so on, but it must have been less complicated... :-)

like image 64
torek Avatar answered Oct 14 '22 05:10

torek