When I manually check if a local Git repository needs updates from a remote server I'll run git remote show upstream
and interpret its output. But now I'm trying to do this within an Python application to which I'm adding Git support.
Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged).
I know how to do that with by comparing the result of, say git rev-list master..upstream/master
and ?git rev-list upstream/master..master
. But this works only after having fetched from the remote server.
Is there a way to accomplish such a comparison without fetching first?
One use is to update the application itself, for this I think it's acceptable to fetch first. But I also want to walk through all registered remotes and their branches to tell the user where he can get more stuff. I think it's inacceptable to fetch all remotes first because the user probably won't need most of them.
I assume ls-remote
is the command I'm looking for, but I don't see how I can achieve what I need. I can compare the result of git ls-remote --heads upstream
and git rev-parse HEAD^
to determine if there are differences, but I don't know how to proceed.
Do I have to use git ls-remote upstream
to get the complete list of commits and manually compare it to the list of local commits? Actually I'm hoping to find an equivalent to git rev-list
that also works with a remote repository.
Maybe someone knows how git remote show upstream
performs its comparisons?
EDIT: @torek: Thank you very much for your detailed answer. It will take some time to digest but I'll go through it at a more productive time of day, promised ;-)
Maybe there is some need for clarification about the context of the intended usage. Maybe some things are simpler than you suspect (because I'm not doing something like a generic Git GUI client).
We're having an existing Python application that is hosted at Github. Only the main developer has push access to the repository, and he only exposes his master
branch publicly.
There are users who use the downloadable packages and users who run the application from the Git repository (which is particularly useful with Python as an interpreted language).
The first thing I'm currently implementing is an interface from within the application to update itself through Git. (OK, that's not really groundbreaking as anybody could go to the command line and issue git pull origin master
or whatever he named the remote. But I'm talking this as a first (learning) step for more advanced tools to offer Git workflows for working with the application's documents/projects.
For this it's OK to always fetch
because someone who clicks on the "Check for updates" button is expected to accept a fetch. It's also quite clear how everything works, I determine the name of the remote by looking at their URLs to know which one (if there are more than one) points at the "official" repository.
But there are also users (like me) who are at the same time contributors. They generally have forked the repository and therefore have at least two remotes, the main repo and their personal fork. Sometimes they have also registered others' forks in order to inspect their contributions before they are merged into master. When I am approaching a pull request I also sometimes ask around to fetch my new material in order to give pre-pull-request feedback.
What I'm now trying to achieve is basically a list of all branches on all remotes with information which of them have new material and possibly how they are related to upstream/master
. E.g. tell that it is branched off from master 17 commits behind and contains 12 commits not contained in the upstream repo.
My reasoning is that it isn't good behaviour to fetch all those remote branches completely (and regularly). I think the user should only fetch branches he actually wants to inspect.
But from first reading through your answer it may well be that I will end up fetching everything in the background and then interpret the comparisons between the local and the 'local remote' branches.
You can view that origin with the command git remote -v, which will list the URL of the remote repo.
The origin Remote When you clone a repository with git clone , it automatically creates a remote connection called origin pointing back to the cloned repository. This is useful for developers creating a local copy of a central repository, since it provides an easy way to pull upstream changes or publish local commits.
Command #1: git branch -r This Git command will show you remote branches. The -r flag here is short for --remotes . This is the command I use personally.
Out of order:
Actually I'm hoping to find an equivalent to git rev-list that also works with a remote repository.
There isn't one. This is important below, if we want to see how many commits some remote has that we don't.
Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged). ... Is there a way to accomplish such a comparison without fetching first?
Well, mainly no, although this depends in part on how literal you want to be here, and how exact you need the results. Also, keep in mind that the moment after you disconnect from a remote, having gotten updates from it, someone else might connect to that same remote and change everything. You've also written the remote, as if there is only one; there might be more than one remote.
Using git fetch
makes a connection to the remote(s) and queries them regarding references (branch heads and tags mainly but also things like git notes), and then brings over any new stuff as wanted/needed.
Using git ls-remote
makes a connection to the remote(s) and queries them (and then stops there).
Thus, if the remote is "hard to reach" (e.g., establishing a connection takes a second or two, or requires entering something like an ssh password or phrase) but updates are small and/or fast (once connection is established, transfers are quick) it's more economical to just fetch
, because making a second connection later is painful. If it's "easy to reach" but updates may be large and/or slow, you may be better off with ls-remote
. But either way, you're making the connection to the remote, which you might consider to be "equivalent" to doing a fetch
. And if you need to list intermediate commit IDs, you have to bring those commits over, so you have to do a full fetch
.
There's another wrinkle with fetch
that I will get to in a bit.
Let's take a look at sample git ls-remote
output, and git remote show origin
. I'll do a git fetch origin
first (though there's no output because it's already up to date):
$ git fetch origin
$ git ls-remote origin
120a630b0b71193a33cd033ae9ddcee1db3df07e HEAD
120a630b0b71193a33cd033ae9ddcee1db3df07e refs/heads/master
$ git remote show origin
* remote origin
Fetch URL: ssh://[host]//tmp/tt.git/
Push URL: ssh://[host]//tmp/tt.git/
HEAD branch: master
Local branch configured for 'git pull':
master merges with remote master
Local ref configured for 'git push':
master pushes to master (fast-forwardable)
(The HEAD branch
shown here is a guess and you should generally ignore it. It's computed by matching up the SHA-1 for HEAD
against the SHA-1s for all the refs/heads/*
. It's only guaranteed to be correct if there's exactly one match. If there are two or more matches, it could accidentally be correct, but git needs a protocol change to make this work reliably.)
The URLs are from git config --get remote.origin.url
and git config --get remote.origin.pushurl
respectively (with the default push URL, if none is set, being the same as the fetch URL).
Now let's look at why master merges with remote master
. That's because of these two config items:
$ git config --get branch.master.remote
origin
$ git config --get branch.master.merge
refs/heads/master
(There's some deep weirdness, probably historical accident, in the latter setting. If you read the documentation for git merge you will see this:
The values of the
branch.<current branch>.merge
that name the branches at the remote named bybranch.<current branch>.remote
are consulted, and then they are mapped viaremote.<remote>.fetch
to their corresponding remote-tracking branches, and the tips of these tracking branches are merged.
With "sane" configurations—see git fetch
notes below—this means that refs/heads/master
above really means refs/remotes/origin/master
.)
Also, master pushes to master
in this particular case because I set git config push.default matching
in this repo, to make it act like git did before there was a push.default
. If you have a newer version of git and/or have not set push.default
, or have set it differently, it might push to something else. Possible values now are nothing
, current
, upstream
, simple
, and matching
; see the git-config documentation.
Now, as to why this push is a fast-forward: from the ls-remote
output, we see that the remote's refs/heads/master
(i.e., what our master
will push-to) refers to 120a630b0b71193a33cd033ae9ddcee1db3df07e
. As you already know (but maybe do not realize), we can see what we have that they don't with:
$ git rev-list 120a630b0b71193a33cd033ae9ddcee1db3df07e..master
eed7b697cab0cbd5babf382f720668e12a86cf2a
224384fed46e1949c88eb514fa67743be66a4c5a
ddc0aab680bab0bd6a7dde4a6ef8cb58ba0368e6
ade842c8562cdccd1e98f7ffd5149a12ddc9226c
We have four commits that they don't. And, because I ran git fetch
before I started all this and have a sane config, we can see what they have that we don't:
$ git rev-list master..120a630b0b71193a33cd033ae9ddcee1db3df07e
which is nothing. There's one more bit we need to know—in fact, we should start with this—namely: "is 120a630...
actually an ancestor of our master
(ade842c...
), or if not, is there some common ancestor between that and our master
?" I will use one abbreviated SHA-1, and the name master
, for length here:
$ if git merge-base --is-ancestor 120a630 master; then echo OK; fi
OK
—so this is "fast-forwardable": we're ahead 4 and behind 0. (In fact, being an ancestor implies immediately that we're not behind: it's the easiest test and is one you can perform if you have only the output of ls-remote
.)
If 120a630
were not an ancestor of master
, that would mean one of two things. Maybe our master
is completely unrelated to their master
, and we're not "ahead" or "behind" at all, we're on a completely different set of train-tracks. Or—probably more likely—they're just ahead of us (we can fast forward), or we have some common ancestor, with a commit graph fragment like this:
D--E--F
/
A--B--C
\
G--H
(where C
is the common ancestor and they're at F
and we're at H
, for instance, and we can rebase or merge).
To find out, we need to start with their master
and work backwards, and start with our master
and work backwards, and see if those meet at some point. We can use git merge-base
will find the point, but this means we need to have not just their master
commit-ID F
, but also the in-between IDs (D
and E
) leading up to that point. Which again means we need to git fetch
!
If you run git fetch
, it will not only discover that their refs/heads/master
is at 120a630b0b71193a33cd033ae9ddcee1db3df07e
, it will also bring over any needed commits (possibly none, possibly many), which of course gets you their IDs so you can git rev-list
them.
Using git fetch
will also update our git references to set refs/remotes/origin/master
. But that is only because of this:
$ git config --get remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*
This config item says that after fetch
gets the list of refs (the same ones ls-remote
prints), it should take any that match refs/heads/*
, change the name to refs/remotes/origin/<match>
, and stuff those into the local repo.
It's possible to change this, so that git fetch
does not update origin/master
. If someone has done that, git rev-list origin/master..master
will not be useful. (And I'm not sure if we'd get commits D
, E
, and F
either! I've never run with a crazy fetch config.)
To summarize, you need to figure out:
refs/heads/*
) correspond to those remotes (for pull and/or push)matching
, current
, simple
-if-name-same), a potentially different name (upstream
), or "never" (nothing
, simple
-if-name-different)refs/remotes/
(based on remote.name.fetch
lines)It's all quite messy, because push
and fetch
are asymmetric. It's possible that git push blarg
will push matching
(so if blarg
has a branch named glink
, we'll push our glink
there, even if glink
has no branch.glink.remote
set). There's also config variables remote.pushdefault
, remote.name.push
, and so on; and more configurations for fetch
as well (again, see git-config documentation).
(I suspect you're best off just running git fetch
, and then probably using git branch -vv
.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With