Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plumbing equivalent to git remote show origin (use from Python)

Tags:

git

python

When I manually check if a local Git repository needs updates from a remote server I'll run git remote show upstream and interpret its output. But now I'm trying to do this within an Python application to which I'm adding Git support.

Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged).

I know how to do that with by comparing the result of, say git rev-list master..upstream/master and ?git rev-list upstream/master..master. But this works only after having fetched from the remote server.

Is there a way to accomplish such a comparison without fetching first?
One use is to update the application itself, for this I think it's acceptable to fetch first. But I also want to walk through all registered remotes and their branches to tell the user where he can get more stuff. I think it's inacceptable to fetch all remotes first because the user probably won't need most of them.

I assume ls-remote is the command I'm looking for, but I don't see how I can achieve what I need. I can compare the result of git ls-remote --heads upstream and git rev-parse HEAD^ to determine if there are differences, but I don't know how to proceed.
Do I have to use git ls-remote upstream to get the complete list of commits and manually compare it to the list of local commits? Actually I'm hoping to find an equivalent to git rev-list that also works with a remote repository.
Maybe someone knows how git remote show upstream performs its comparisons?


EDIT: @torek: Thank you very much for your detailed answer. It will take some time to digest but I'll go through it at a more productive time of day, promised ;-)
Maybe there is some need for clarification about the context of the intended usage. Maybe some things are simpler than you suspect (because I'm not doing something like a generic Git GUI client).

We're having an existing Python application that is hosted at Github. Only the main developer has push access to the repository, and he only exposes his master branch publicly.

There are users who use the downloadable packages and users who run the application from the Git repository (which is particularly useful with Python as an interpreted language).

The first thing I'm currently implementing is an interface from within the application to update itself through Git. (OK, that's not really groundbreaking as anybody could go to the command line and issue git pull origin master or whatever he named the remote. But I'm talking this as a first (learning) step for more advanced tools to offer Git workflows for working with the application's documents/projects.
For this it's OK to always fetch because someone who clicks on the "Check for updates" button is expected to accept a fetch. It's also quite clear how everything works, I determine the name of the remote by looking at their URLs to know which one (if there are more than one) points at the "official" repository.

But there are also users (like me) who are at the same time contributors. They generally have forked the repository and therefore have at least two remotes, the main repo and their personal fork. Sometimes they have also registered others' forks in order to inspect their contributions before they are merged into master. When I am approaching a pull request I also sometimes ask around to fetch my new material in order to give pre-pull-request feedback.

What I'm now trying to achieve is basically a list of all branches on all remotes with information which of them have new material and possibly how they are related to upstream/master. E.g. tell that it is branched off from master 17 commits behind and contains 12 commits not contained in the upstream repo.
My reasoning is that it isn't good behaviour to fetch all those remote branches completely (and regularly). I think the user should only fetch branches he actually wants to inspect.

But from first reading through your answer it may well be that I will end up fetching everything in the background and then interpret the comparisons between the local and the 'local remote' branches.

like image 552
uli_1973 Avatar asked Oct 11 '13 07:10

uli_1973


People also ask

How do I show origin in git?

You can view that origin with the command git remote -v, which will list the URL of the remote repo.

What does git remote show origin do?

The origin Remote When you clone a repository with git clone , it automatically creates a remote connection called origin pointing back to the cloned repository. This is useful for developers creating a local copy of a central repository, since it provides an easy way to pull upstream changes or publish local commits.

Which command in git helps to show more information on remote?

Command #1: git branch -r This Git command will show you remote branches. The -r flag here is short for --remotes . This is the command I use personally.


1 Answers

Out of order:

Actually I'm hoping to find an equivalent to git rev-list that also works with a remote repository.

There isn't one. This is important below, if we want to see how many commits some remote has that we don't.

Actually I'm trying to determine if a given branch on the remote server differs from a given local branch, and if it does, how the relation is (fast-forwardable, ahead, behind, diverged). ... Is there a way to accomplish such a comparison without fetching first?

Well, mainly no, although this depends in part on how literal you want to be here, and how exact you need the results. Also, keep in mind that the moment after you disconnect from a remote, having gotten updates from it, someone else might connect to that same remote and change everything. You've also written the remote, as if there is only one; there might be more than one remote.

Using git fetch makes a connection to the remote(s) and queries them regarding references (branch heads and tags mainly but also things like git notes), and then brings over any new stuff as wanted/needed.

Using git ls-remote makes a connection to the remote(s) and queries them (and then stops there).

Thus, if the remote is "hard to reach" (e.g., establishing a connection takes a second or two, or requires entering something like an ssh password or phrase) but updates are small and/or fast (once connection is established, transfers are quick) it's more economical to just fetch, because making a second connection later is painful. If it's "easy to reach" but updates may be large and/or slow, you may be better off with ls-remote. But either way, you're making the connection to the remote, which you might consider to be "equivalent" to doing a fetch. And if you need to list intermediate commit IDs, you have to bring those commits over, so you have to do a full fetch.

There's another wrinkle with fetch that I will get to in a bit.

Let's take a look at sample git ls-remote output, and git remote show origin. I'll do a git fetch origin first (though there's no output because it's already up to date):

$ git fetch origin
$ git ls-remote origin
120a630b0b71193a33cd033ae9ddcee1db3df07e    HEAD
120a630b0b71193a33cd033ae9ddcee1db3df07e    refs/heads/master
$ git remote show origin
* remote origin
  Fetch URL: ssh://[host]//tmp/tt.git/
  Push  URL: ssh://[host]//tmp/tt.git/
  HEAD branch: master
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (fast-forwardable)

(The HEAD branch shown here is a guess and you should generally ignore it. It's computed by matching up the SHA-1 for HEAD against the SHA-1s for all the refs/heads/*. It's only guaranteed to be correct if there's exactly one match. If there are two or more matches, it could accidentally be correct, but git needs a protocol change to make this work reliably.)

The URLs are from git config --get remote.origin.url and git config --get remote.origin.pushurl respectively (with the default push URL, if none is set, being the same as the fetch URL).

Now let's look at why master merges with remote master. That's because of these two config items:

$ git config --get branch.master.remote
origin
$ git config --get branch.master.merge
refs/heads/master

(There's some deep weirdness, probably historical accident, in the latter setting. If you read the documentation for git merge you will see this:

The values of the branch.<current branch>.merge that name the branches at the remote named by branch.<current branch>.remote are consulted, and then they are mapped via remote.<remote>.fetch to their corresponding remote-tracking branches, and the tips of these tracking branches are merged.

With "sane" configurations—see git fetch notes below—this means that refs/heads/master above really means refs/remotes/origin/master.)

Also, master pushes to master in this particular case because I set git config push.default matching in this repo, to make it act like git did before there was a push.default. If you have a newer version of git and/or have not set push.default, or have set it differently, it might push to something else. Possible values now are nothing, current, upstream, simple, and matching; see the git-config documentation.

Now, as to why this push is a fast-forward: from the ls-remote output, we see that the remote's refs/heads/master (i.e., what our master will push-to) refers to 120a630b0b71193a33cd033ae9ddcee1db3df07e. As you already know (but maybe do not realize), we can see what we have that they don't with:

$ git rev-list 120a630b0b71193a33cd033ae9ddcee1db3df07e..master
eed7b697cab0cbd5babf382f720668e12a86cf2a
224384fed46e1949c88eb514fa67743be66a4c5a
ddc0aab680bab0bd6a7dde4a6ef8cb58ba0368e6
ade842c8562cdccd1e98f7ffd5149a12ddc9226c

We have four commits that they don't. And, because I ran git fetch before I started all this and have a sane config, we can see what they have that we don't:

$ git rev-list master..120a630b0b71193a33cd033ae9ddcee1db3df07e

which is nothing. There's one more bit we need to know—in fact, we should start with this—namely: "is 120a630... actually an ancestor of our master (ade842c...), or if not, is there some common ancestor between that and our master?" I will use one abbreviated SHA-1, and the name master, for length here:

$ if git merge-base --is-ancestor 120a630 master; then echo OK; fi
OK

—so this is "fast-forwardable": we're ahead 4 and behind 0. (In fact, being an ancestor implies immediately that we're not behind: it's the easiest test and is one you can perform if you have only the output of ls-remote.)

If 120a630 were not an ancestor of master, that would mean one of two things. Maybe our master is completely unrelated to their master, and we're not "ahead" or "behind" at all, we're on a completely different set of train-tracks. Or—probably more likely—they're just ahead of us (we can fast forward), or we have some common ancestor, with a commit graph fragment like this:

        D--E--F
       /
A--B--C
       \
        G--H

(where C is the common ancestor and they're at F and we're at H, for instance, and we can rebase or merge).

To find out, we need to start with their master and work backwards, and start with our master and work backwards, and see if those meet at some point. We can use git merge-base will find the point, but this means we need to have not just their master commit-ID F, but also the in-between IDs (D and E) leading up to that point. Which again means we need to git fetch!

If you run git fetch, it will not only discover that their refs/heads/master is at 120a630b0b71193a33cd033ae9ddcee1db3df07e, it will also bring over any needed commits (possibly none, possibly many), which of course gets you their IDs so you can git rev-list them.

Using git fetch will also update our git references to set refs/remotes/origin/master. But that is only because of this:

$ git config --get remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*

This config item says that after fetch gets the list of refs (the same ones ls-remote prints), it should take any that match refs/heads/*, change the name to refs/remotes/origin/<match>, and stuff those into the local repo.

It's possible to change this, so that git fetch does not update origin/master. If someone has done that, git rev-list origin/master..master will not be useful. (And I'm not sure if we'd get commits D, E, and F either! I've never run with a crazy fetch config.)

To summarize, you need to figure out:

  • which remote(s) to contact, if any
  • which local branches (refs/heads/*) correspond to those remotes (for pull and/or push)
  • whether their branch heads are related to ours (whether by the same names or different names)
  • whether pushing will push to the same name (matching, current, simple-if-name-same), a potentially different name (upstream), or "never" (nothing, simple-if-name-different)
  • whether, if you choose not to contact some or all remotes, to trust refs in refs/remotes/ (based on remote.name.fetch lines)

It's all quite messy, because push and fetch are asymmetric. It's possible that git push blarg will push matching (so if blarg has a branch named glink, we'll push our glink there, even if glink has no branch.glink.remote set). There's also config variables remote.pushdefault, remote.name.push, and so on; and more configurations for fetch as well (again, see git-config documentation).

(I suspect you're best off just running git fetch, and then probably using git branch -vv.)

like image 56
torek Avatar answered Sep 29 '22 17:09

torek