When I manually check if a local Git repository needs updates from a remote server I'll run <code>git remote show upstream</code> and interpret its output. But now I'm trying to do this within an Python application to which I'm adding Git support. Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged). I know how to do that with by comparing the result of, say <code>git rev-list master..upstream/master</code> and ?<code>git rev-list upstream/master..master</code>. But this works only after having fetched from the remote server. Is there a way to accomplish such a comparison without fetching first? One use is to update the application itself, for this I think it's acceptable to fetch first. But I also want to walk through all registered remotes and their branches to tell the user where he can get more stuff. I think it's inacceptable to fetch all remotes first because the user probably won't need most of them. I assume <code>ls-remote</code> is the command I'm looking for, but I don't see how I can achieve what I need. I can compare the result of <code>git ls-remote --heads upstream</code> and <code>git rev-parse HEAD^</code> to determine if there are differences, but I don't know how to proceed. Do I have to use <code>git ls-remote upstream</code> to get the complete list of commits and manually compare it to the list of local commits? Actually I'm hoping to find an equivalent to <code>git rev-list</code> that also works with a remote repository. Maybe someone knows how <code>git remote show upstream</code> performs its comparisons? <hr> EDIT: @torek: Thank you very much for your detailed answer. It will take some time to digest but I'll go through it at a more productive time of day, promised ;-) Maybe there is some need for clarification about the context of the intended usage. Maybe some things are simpler than you suspect (because I'm not doing something like a generic Git GUI client). We're having an existing Python application that is hosted at Github. Only the main developer has push access to the repository, and he only exposes his <code>master</code> branch publicly. There are users who use the downloadable packages and users who run the application from the Git repository (which is particularly useful with Python as an interpreted language). The first thing I'm currently implementing is an interface from within the application to update itself through Git. (OK, that's not really groundbreaking as anybody could go to the command line and issue <code>git pull origin master</code> or whatever he named the remote. But I'm talking this as a first (learning) step for more advanced tools to offer Git workflows for working with the application's documents/projects. For this it's OK to always <code>fetch</code> because someone who clicks on the "Check for updates" button is expected to accept a fetch. It's also quite clear how everything works, I determine the name of the remote by looking at their URLs to know which one (if there are more than one) points at the "official" repository. But there are also users (like me) who are at the same time contributors. They generally have forked the repository and therefore have at least two remotes, the main repo and their personal fork. Sometimes they have also registered others' forks in order to inspect their contributions before they are merged into master. When I am approaching a pull request I also sometimes ask around to fetch my new material in order to give pre-pull-request feedback. What I'm now trying to achieve is basically a list of all branches on all remotes with information which of them have new material and possibly how they are related to <code>upstream/master</code>. E.g. tell that it is branched off from master 17 commits behind and contains 12 commits not contained in the upstream repo. My reasoning is that it isn't good behaviour to fetch all those remote branches completely (and regularly). I think the user should only fetch branches he actually wants to inspect. But from first reading through your answer it may well be that I will end up fetching everything in the background and then interpret the comparisons between the local and the 'local remote' branches.

Out of order: <blockquote> Actually I'm hoping to find an equivalent to git rev-list that also works with a remote repository. </blockquote> There isn't one. This is important below, if we want to see how many commits some remote has that we don't. <blockquote> Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged). ... Is there a way to accomplish such a comparison without fetching first? </blockquote> Well, mainly no, although this depends in part on how literal you want to be here, and how exact you need the results. Also, keep in mind that the moment after you disconnect from a remote, having gotten updates from it, someone else might connect to that same remote and change everything. You've also written the remote, as if there is only one; there might be more than one remote. Using <code>git fetch</code> makes a connection to the remote(s) and queries them regarding references (branch heads and tags mainly but also things like git notes), and then brings over any new stuff as wanted/needed. Using <code>git ls-remote</code> makes a connection to the remote(s) and queries them (and then stops there). Thus, if the remote is "hard to reach" (e.g., establishing a connection takes a second or two, or requires entering something like an ssh password or phrase) but updates are small and/or fast (once connection is established, transfers are quick) it's more economical to just <code>fetch</code>, because making a second connection later is painful. If it's "easy to reach" but updates may be large and/or slow, you may be better off with <code>ls-remote</code>. But either way, you're making the connection to the remote, which you might consider to be "equivalent" to doing a <code>fetch</code>. And if you need to list intermediate commit IDs, you have to bring those commits over, so you have to do a full <code>fetch</code>. There's another wrinkle with <code>fetch</code> that I will get to in a bit. Let's take a look at sample <code>git ls-remote</code> output, and <code>git remote show origin</code>. I'll do a <code>git fetch origin</code> first (though there's no output because it's already up to date): <pre class="prettyprint"><code>$ git fetch origin $ git ls-remote origin 120a630b0b71193a33cd033ae9ddcee1db3df07e HEAD 120a630b0b71193a33cd033ae9ddcee1db3df07e refs/heads/master $ git remote show origin * remote origin Fetch URL: ssh://[host]//tmp/tt.git/ Push URL: ssh://[host]//tmp/tt.git/ HEAD branch: master Local branch configured for 'git pull': master merges with remote master Local ref configured for 'git push': master pushes to master (fast-forwardable) </code></pre> (The <code>HEAD branch</code> shown here is a guess and you should generally ignore it. It's computed by matching up the SHA-1 for <code>HEAD</code> against the SHA-1s for all the <code>refs/heads/*</code>. It's only guaranteed to be correct if there's exactly one match. If there are two or more matches, it could accidentally be correct, but git needs a protocol change to make this work reliably.) The URLs are from <code>git config --get remote.origin.url</code> and <code>git config --get remote.origin.pushurl</code> respectively (with the default push URL, if none is set, being the same as the fetch URL). Now let's look at why <code>master merges with remote master</code>. That's because of these two config items: <pre class="prettyprint"><code>$ git config --get branch.master.remote origin $ git config --get branch.master.merge refs/heads/master </code></pre> (There's some deep weirdness, probably historical accident, in the latter setting. If you read the documentation for git merge you will see this: <blockquote> The values of the <code>branch.<current branch>.merge</code> that name the branches at the remote named by <code>branch.<current branch>.remote</code> are consulted, and then they are mapped via <code>remote.<remote>.fetch</code> to their corresponding remote-tracking branches, and the tips of these tracking branches are merged. </blockquote> With "sane" configurations—see <code>git fetch</code> notes below—this means that <code>refs/heads/master</code> above really means <code>refs/remotes/origin/master</code>.) Also, <code>master pushes to master</code> in this particular case because I set <code>git config push.default matching</code> in this repo, to make it act like git did before there was a <code>push.default</code>. If you have a newer version of git and/or have not set <code>push.default</code>, or have set it differently, it might push to something else. Possible values now are <code>nothing</code>, <code>current</code>, <code>upstream</code>, <code>simple</code>, and <code>matching</code>; see the git-config documentation. Now, as to why this push is a fast-forward: from the <code>ls-remote</code> output, we see that the remote's <code>refs/heads/master</code> (i.e., what our <code>master</code> will push-to) refers to <code>120a630b0b71193a33cd033ae9ddcee1db3df07e</code>. As you already know (but maybe do not realize), we can see what we have that they don't with: <pre class="prettyprint"><code>$ git rev-list 120a630b0b71193a33cd033ae9ddcee1db3df07e..master eed7b697cab0cbd5babf382f720668e12a86cf2a 224384fed46e1949c88eb514fa67743be66a4c5a ddc0aab680bab0bd6a7dde4a6ef8cb58ba0368e6 ade842c8562cdccd1e98f7ffd5149a12ddc9226c </code></pre> We have four commits that they don't. And, because I ran <code>git fetch</code> before I started all this and have a sane config, we can see what they have that we don't: <pre class="prettyprint"><code>$ git rev-list master..120a630b0b71193a33cd033ae9ddcee1db3df07e </code></pre> which is nothing. There's one more bit we need to know—in fact, we should start with this—namely: "is <code>120a630...</code> actually an ancestor of our <code>master</code> (<code>ade842c...</code>), or if not, is there some common ancestor between that and our <code>master</code>?" I will use one abbreviated SHA-1, and the name <code>master</code>, for length here: <pre class="prettyprint"><code>$ if git merge-base --is-ancestor 120a630 master; then echo OK; fi OK </code></pre> —so this is "fast-forwardable": we're ahead 4 and behind 0. (In fact, being an ancestor implies immediately that we're not behind: it's the easiest test and is one you can perform if you have only the output of <code>ls-remote</code>.) If <code>120a630</code> were not an ancestor of <code>master</code>, that would mean one of two things. Maybe our <code>master</code> is completely unrelated to their <code>master</code>, and we're not "ahead" or "behind" at all, we're on a completely different set of train-tracks. Or—probably more likely—they're just ahead of us (we can fast forward), or we have some common ancestor, with a commit graph fragment like this: <pre class="prettyprint"><code> D--E--F / A--B--C \ G--H </code></pre> (where <code>C</code> is the common ancestor and they're at <code>F</code> and we're at <code>H</code>, for instance, and we can rebase or merge). To find out, we need to start with their <code>master</code> and work backwards, and start with our <code>master</code> and work backwards, and see if those meet at some point. We can use <code>git merge-base</code> will find the point, but this means we need to have not just their <code>master</code> commit-ID <code>F</code>, but also the in-between IDs (<code>D</code> and <code>E</code>) leading up to that point. Which again means we need to <code>git fetch</code>! If you run <code>git fetch</code>, it will not only discover that their <code>refs/heads/master</code> is at <code>120a630b0b71193a33cd033ae9ddcee1db3df07e</code>, it will also bring over any needed commits (possibly none, possibly many), which of course gets you their IDs so you can <code>git rev-list</code> them. Using <code>git fetch</code> will also update our git references to set <code>refs/remotes/origin/master</code>. But that is only because of this: <pre class="prettyprint"><code>$ git config --get remote.origin.fetch +refs/heads/*:refs/remotes/origin/* </code></pre> This config item says that after <code>fetch</code> gets the list of refs (the same ones <code>ls-remote</code> prints), it should take any that match <code>refs/heads/*</code>, change the name to <code>refs/remotes/origin/<match></code>, and stuff those into the local repo. It's possible to change this, so that <code>git fetch</code> does not update <code>origin/master</code>. If someone has done that, <code>git rev-list origin/master..master</code> will not be useful. (And I'm not sure if we'd get commits <code>D</code>, <code>E</code>, and <code>F</code> either! I've never run with a crazy fetch config.) To summarize, you need to figure out: <ul> <li>which remote(s) to contact, if any</li> <li>which local branches (<code>refs/heads/*</code>) correspond to those remotes (for pull and/or push)</li> <li>whether their branch heads are related to ours (whether by the same names or different names)</li> <li>whether pushing will push to the same name (<code>matching</code>, <code>current</code>, <code>simple</code>-if-name-same), a potentially different name (<code>upstream</code>), or "never" (<code>nothing</code>, <code>simple</code>-if-name-different)</li> <li>whether, if you choose not to contact some or all remotes, to trust refs in <code>refs/remotes/</code> (based on <code>remote.name.fetch</code> lines)</li> </ul> It's all quite messy, because <code>push</code> and <code>fetch</code> are asymmetric. It's possible that <code>git push blarg</code> will push <code>matching</code> (so if <code>blarg</code> has a branch named <code>glink</code>, we'll push our <code>glink</code> there, even if <code>glink</code> has no <code>branch.glink.remote</code> set). There's also config variables <code>remote.pushdefault</code>, <code>remote.name.push</code>, and so on; and more configurations for <code>fetch</code> as well (again, see git-config documentation). (I suspect you're best off just running <code>git fetch</code>, and then probably using <code>git branch -vv</code>.)

Plumbing equivalent to git remote show origin (use from Python)

Tags:

git

python

When I manually check if a local Git repository needs updates from a remote server I'll run git remote show upstream and interpret its output. But now I'm trying to do this within an Python application to which I'm adding Git support.

Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged).

I know how to do that with by comparing the result of, say git rev-list master..upstream/master and ?git rev-list upstream/master..master. But this works only after having fetched from the remote server.

Is there a way to accomplish such a comparison without fetching first?
One use is to update the application itself, for this I think it's acceptable to fetch first. But I also want to walk through all registered remotes and their branches to tell the user where he can get more stuff. I think it's inacceptable to fetch all remotes first because the user probably won't need most of them.

I assume ls-remote is the command I'm looking for, but I don't see how I can achieve what I need. I can compare the result of git ls-remote --heads upstream and git rev-parse HEAD^ to determine if there are differences, but I don't know how to proceed.
Do I have to use git ls-remote upstream to get the complete list of commits and manually compare it to the list of local commits? Actually I'm hoping to find an equivalent to git rev-list that also works with a remote repository.
Maybe someone knows how git remote show upstream performs its comparisons?

EDIT: @torek: Thank you very much for your detailed answer. It will take some time to digest but I'll go through it at a more productive time of day, promised ;-)
Maybe there is some need for clarification about the context of the intended usage. Maybe some things are simpler than you suspect (because I'm not doing something like a generic Git GUI client).

We're having an existing Python application that is hosted at Github. Only the main developer has push access to the repository, and he only exposes his master branch publicly.

There are users who use the downloadable packages and users who run the application from the Git repository (which is particularly useful with Python as an interpreted language).

The first thing I'm currently implementing is an interface from within the application to update itself through Git. (OK, that's not really groundbreaking as anybody could go to the command line and issue git pull origin master or whatever he named the remote. But I'm talking this as a first (learning) step for more advanced tools to offer Git workflows for working with the application's documents/projects.
For this it's OK to always fetch because someone who clicks on the "Check for updates" button is expected to accept a fetch. It's also quite clear how everything works, I determine the name of the remote by looking at their URLs to know which one (if there are more than one) points at the "official" repository.

But there are also users (like me) who are at the same time contributors. They generally have forked the repository and therefore have at least two remotes, the main repo and their personal fork. Sometimes they have also registered others' forks in order to inspect their contributions before they are merged into master. When I am approaching a pull request I also sometimes ask around to fetch my new material in order to give pre-pull-request feedback.

What I'm now trying to achieve is basically a list of all branches on all remotes with information which of them have new material and possibly how they are related to upstream/master. E.g. tell that it is branched off from master 17 commits behind and contains 12 commits not contained in the upstream repo.
My reasoning is that it isn't good behaviour to fetch all those remote branches completely (and regularly). I think the user should only fetch branches he actually wants to inspect.

But from first reading through your answer it may well be that I will end up fetching everything in the background and then interpret the comparisons between the local and the 'local remote' branches.

552

asked Oct 11 '13 07:10

uli_1973

1 Answers

Out of order:

Actually I'm hoping to find an equivalent to git rev-list that also works with a remote repository.

There isn't one. This is important below, if we want to see how many commits some remote has that we don't.

Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged). ... Is there a way to accomplish such a comparison without fetching first?

Well, mainly no, although this depends in part on how literal you want to be here, and how exact you need the results. Also, keep in mind that the moment after you disconnect from a remote, having gotten updates from it, someone else might connect to that same remote and change everything. You've also written the remote, as if there is only one; there might be more than one remote.

Using git fetch makes a connection to the remote(s) and queries them regarding references (branch heads and tags mainly but also things like git notes), and then brings over any new stuff as wanted/needed.

Using git ls-remote makes a connection to the remote(s) and queries them (and then stops there).

Thus, if the remote is "hard to reach" (e.g., establishing a connection takes a second or two, or requires entering something like an ssh password or phrase) but updates are small and/or fast (once connection is established, transfers are quick) it's more economical to just fetch, because making a second connection later is painful. If it's "easy to reach" but updates may be large and/or slow, you may be better off with ls-remote. But either way, you're making the connection to the remote, which you might consider to be "equivalent" to doing a fetch. And if you need to list intermediate commit IDs, you have to bring those commits over, so you have to do a full fetch.

There's another wrinkle with fetch that I will get to in a bit.

Let's take a look at sample git ls-remote output, and git remote show origin. I'll do a git fetch origin first (though there's no output because it's already up to date):

$ git fetch origin
$ git ls-remote origin
120a630b0b71193a33cd033ae9ddcee1db3df07e    HEAD
120a630b0b71193a33cd033ae9ddcee1db3df07e    refs/heads/master
$ git remote show origin
* remote origin
  Fetch URL: ssh://[host]//tmp/tt.git/
  Push  URL: ssh://[host]//tmp/tt.git/
  HEAD branch: master
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (fast-forwardable)

(The HEAD branch shown here is a guess and you should generally ignore it. It's computed by matching up the SHA-1 for HEAD against the SHA-1s for all the refs/heads/*. It's only guaranteed to be correct if there's exactly one match. If there are two or more matches, it could accidentally be correct, but git needs a protocol change to make this work reliably.)

The URLs are from git config --get remote.origin.url and git config --get remote.origin.pushurl respectively (with the default push URL, if none is set, being the same as the fetch URL).

Now let's look at why master merges with remote master. That's because of these two config items:

$ git config --get branch.master.remote
origin
$ git config --get branch.master.merge
refs/heads/master

(There's some deep weirdness, probably historical accident, in the latter setting. If you read the documentation for git merge you will see this:

The values of the branch.<current branch>.merge that name the branches at the remote named by branch.<current branch>.remote are consulted, and then they are mapped via remote.<remote>.fetch to their corresponding remote-tracking branches, and the tips of these tracking branches are merged.

With "sane" configurations—see git fetch notes below—this means that refs/heads/master above really means refs/remotes/origin/master.)

Also, master pushes to master in this particular case because I set git config push.default matching in this repo, to make it act like git did before there was a push.default. If you have a newer version of git and/or have not set push.default, or have set it differently, it might push to something else. Possible values now are nothing, current, upstream, simple, and matching; see the git-config documentation.

Now, as to why this push is a fast-forward: from the ls-remote output, we see that the remote's refs/heads/master (i.e., what our master will push-to) refers to 120a630b0b71193a33cd033ae9ddcee1db3df07e. As you already know (but maybe do not realize), we can see what we have that they don't with:

$ git rev-list 120a630b0b71193a33cd033ae9ddcee1db3df07e..master
eed7b697cab0cbd5babf382f720668e12a86cf2a
224384fed46e1949c88eb514fa67743be66a4c5a
ddc0aab680bab0bd6a7dde4a6ef8cb58ba0368e6
ade842c8562cdccd1e98f7ffd5149a12ddc9226c

We have four commits that they don't. And, because I ran git fetch before I started all this and have a sane config, we can see what they have that we don't:

$ git rev-list master..120a630b0b71193a33cd033ae9ddcee1db3df07e

which is nothing. There's one more bit we need to know—in fact, we should start with this—namely: "is 120a630... actually an ancestor of our master (ade842c...), or if not, is there some common ancestor between that and our master?" I will use one abbreviated SHA-1, and the name master, for length here:

$ if git merge-base --is-ancestor 120a630 master; then echo OK; fi
OK

—so this is "fast-forwardable": we're ahead 4 and behind 0. (In fact, being an ancestor implies immediately that we're not behind: it's the easiest test and is one you can perform if you have only the output of ls-remote.)

If 120a630 were not an ancestor of master, that would mean one of two things. Maybe our master is completely unrelated to their master, and we're not "ahead" or "behind" at all, we're on a completely different set of train-tracks. Or—probably more likely—they're just ahead of us (we can fast forward), or we have some common ancestor, with a commit graph fragment like this:

        D--E--F
       /
A--B--C
       \
        G--H

(where C is the common ancestor and they're at F and we're at H, for instance, and we can rebase or merge).

To find out, we need to start with their master and work backwards, and start with our master and work backwards, and see if those meet at some point. We can use git merge-base will find the point, but this means we need to have not just their master commit-ID F, but also the in-between IDs (D and E) leading up to that point. Which again means we need to git fetch!

If you run git fetch, it will not only discover that their refs/heads/master is at 120a630b0b71193a33cd033ae9ddcee1db3df07e, it will also bring over any needed commits (possibly none, possibly many), which of course gets you their IDs so you can git rev-list them.

Using git fetch will also update our git references to set refs/remotes/origin/master. But that is only because of this:

$ git config --get remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*

This config item says that after fetch gets the list of refs (the same ones ls-remote prints), it should take any that match refs/heads/*, change the name to refs/remotes/origin/<match>, and stuff those into the local repo.

It's possible to change this, so that git fetch does not update origin/master. If someone has done that, git rev-list origin/master..master will not be useful. (And I'm not sure if we'd get commits D, E, and F either! I've never run with a crazy fetch config.)

To summarize, you need to figure out:

which remote(s) to contact, if any
which local branches (refs/heads/*) correspond to those remotes (for pull and/or push)
whether their branch heads are related to ours (whether by the same names or different names)
whether pushing will push to the same name (matching, current, simple-if-name-same), a potentially different name (upstream), or "never" (nothing, simple-if-name-different)
whether, if you choose not to contact some or all remotes, to trust refs in refs/remotes/ (based on remote.name.fetch lines)

It's all quite messy, because push and fetch are asymmetric. It's possible that git push blarg will push matching (so if blarg has a branch named glink, we'll push our glink there, even if glink has no branch.glink.remote set). There's also config variables remote.pushdefault, remote.name.push, and so on; and more configurations for fetch as well (again, see git-config documentation).

(I suspect you're best off just running git fetch, and then probably using git branch -vv.)

answered Sep 29 '22 17:09

torek

Related questions
                            
                                Pass a parent class as an argument?
                            
                                Retrieve id of Jenkins build started with the API using the "location" information in header (new feature of jenkins 1.529)
                            
                                Using python scipy to fit gamma distribution to data
                            
                                Route to worker depending on result in Celery?
                            
                                Python - how can I address an array along a given axis?
                            
                                Comparing two OpenCV images/2D Numpy arrays
                            
                                Getting Attempted relative import in non-package error in spite of having __init__.py
                            
                                How many times can `__del__` be called per object in Python?
                            
                                Overwriting (updating) a pandas Series with values from another Series?
                            
                                how to enumerate OrderedDict in python
                            
                                Fastest possible way to iterate through a specific list?
                            
                                how to use first band of 3d numpy array as imaginary values for all other bands
                            
                                Python 2.7 : difference between exit() and raise ValueError("example")
                            
                                Pycharm Remote Python Interpreter over SSH Gateway, X11 forwarding
                            
                                Python main thread interruption
                            
                                Batch editing of csv files with Python
                            
                                How to filter models using timezone aware dates?
                            
                                Using mysqldb and sqlite3 in the same Python 2.7 script: Should I throw in the towel?
                            
                                MySQL, should I stay connected or connect when needed?
                            
                                converting string to unicode type in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With