Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding git rev-list

While looking for git hook examples, I came across following post: https://github.com/Movidone/git-hooks/blob/master/pre-receive and I wanted to understand the following command:

git rev-list $new_list --not --all 

where new_list is obtained from:

NULL_SHA1="0000000000000000000000000000000000000000" # 40 0's
new_list=
any_deleted=false
while read oldsha newsha refname; do
    case $oldsha,$newsha in
        *,$NULL_SHA1) # it's a delete
            any_deleted=true;;
        $NULL_SHA1,*) # it's a create
            new_list="$new_list $newsha";;
        *,*) # it's an update
            new_list="$new_list $newsha";;
    esac
done

I figured that rev-list shows commits in reverse chronological order.

But, can someone share more insight on what -not and -all options are meant for?

As per the documentation:

--not
Reverses the meaning of the ^ prefix (or lack thereof) for all following revision specifiers, up to the next --not.
--all
Pretend as if all the refs in refs/ are listed on the command line as <commit>. 

I am not able to completely understand these options.

[Update] After doing some test commits, figured that if I don't use --not and --all options then, git rev-list lists all the commits on the branch and not the one's while I intend to push.

However, wanted to understand why doesn't it print the sha values on the terminal when --all option is passed?

like image 790
iDev Avatar asked Oct 16 '20 22:10

iDev


Video Answer


2 Answers

The git rev-list command is a very complicated, very central command in Git, as what it does is walk the graph. The word graph here refers to both the commit graph itself, and in some cases, the next level down (Git objects reachable from commits).

I figured that rev-list shows commits in reverse chronological order.

Not exactly, but close:

  • The order is changeable. The default is reverse-chronological.
  • The default is to walk some commits, but you can get rev-list to go deeper so as to include tree and blob objects and even tag objects. This is for programs like git fetch and git push (which invoke git pack-objects) and git pack-objects. I plan to ignore this possibility entirely here, but I feel I should at least mention it. 😀

So the default is to list some commits in reverse chronological order. It is both important, and a little bit tricky, to specify exactly which parts of the graph we will have git rev-list walk: the some in some commits.

But, can someone share more insight on what --not and --all options are meant for?

As VonC notes, the effect here is to list commits that are new to the receiving repository. This depends on the fact that this git rev-list command is running in a pre-receive hook. It generally doesn't do anything useful outside this particular hook. Thus, as you can see, a hook's run-time environment, in Git, is often at least a little bit special. (This is true for more than just the pre-receive hook: one must think about each hook's activation context.)

More about --not --all

The --all option does just what you quoted from the documentation:

Pretend as if all the refs in refs/ are listed on the command line ...

So this does the equivalent of a git for-each-ref refs: it loops over each reference. That includes branch names (master or main, develop, feature/tall, and so on, all of which are really in refs/heads/), tag names (v1.2 which is really refs/tags/v1.2), remote-tracking names (origin/develop which is really refs/remotes/origin/develop), replacement refs (in refs/replace/), the stash (refs/stash), bisection refs, Gerrit refs if you're using Gerrit, and so on. Note that it does not loop over reflog entries.

The --not prefix is a simple boolean operation. In the gitrevisions syntax—see the gitrevisions documentation—we can write things like develop, meaning I tell you to start from develop and work backwards and include these commits, but also things like ^develop, meaning I tell you to start from develop and work backwards and exclude these commits. So if I write:

git rev-list feature1 feature2 ^main

I am asking Git to walk commits reachable from the commits identified by the names feature1 and feature2, but to exclude commits reachable from the commits identified by main. For (much) more about the general idea of reachability and graph-walking, see Think Like (a) Git.

The --not operator effectively flips the ^ on each ref:

git rev-list --not feature1 feature2 ^main

is shorthand, as it were, for:

git rev-list ^feature1 ^feature2 main

This walks the list of commits reachable from main, but excludes those reachable from either feature1 or feature2.

Usually all commits are findable with --all

If you are using Git in the normal everyday way, and don't have a "detached HEAD" at the moment—detached HEAD mode is not exactly abnormal but it's not the usual way to work—the --all option to git rev-list tells it to include all commits, because all commits are reachable from all references.1 So --not --all effectively excludes all commits. So adding --not --all to any git rev-list that would otherwise list some commits has the effect of inhibiting the list. The output is empty: why did we bother?

If you are in detached HEAD mode and have made several new commits—this can happen when you are in the middle of an interactive or conflicted rebase, for instance—then git rev-list HEAD --not --all would list those commits that are reachable from HEAD but not from any branch name. In that rebase, for instance, that would be just those commits that you have copied so far.

So "detached HEAD" mode would be once place where git rev-list --not --all could be useful from the command line. But for the situation you're examining—a pre-receive hook—we're not really on the command line.

Pre-receive hooks

When someone uses git push to send commits to your own Git, your Git:

  • sets up a quarantine area to hold any new objects (new commits and blobs and so on);1
  • negotiates with the sender to decide what the sender should send;
  • receives these objects; and
  • takes a list of ref update requests. These update requests essentially just say make this name hold this hash ID.2

Before actually doing any of the requested updates, your Git:

  1. Feeds the entire list to the pre-receive hook. That hook can say "no"; if so, the entire push, as a whole, is rejected.
  2. If that says "ok", feeds the list, one request at a time, to the update hook. When that hook says "ok", does the update. If the hook says "no", your Git rejects the one update, but goes on to examine others.
  3. After all updates are accepted or rejected in step 2, feeds the accepted list to the post-receive hook.

Objects that are needed, that were added to some ref in step 2, are moved from quarantine to Git's object database. Those that were rejected are not.

Now, think about a typical git push. We get some new commit(s) and a request: create a new branch name feature/short, or we get some new commit(s) and a request: update existing branch name develop to include these new commits, along with the old ones.

In step 1 above, we have a single new hash ID. We ran a loop to read all the ref names, and their current and proposed-new hash IDs, and the loop ran only once, because only one name was being git push-ed. That hash ID refers to the new commit or commits, that will either be added to this existing branch, or be the tip and other commits that are exclusive to the new branch.

We'd now like to inspect these commits, and not any of the existing commits that are reachable from any existing branch. For simplicity, rather than $new_list in my other answer, let's suppose we just the one new hash ID, $new, and the old hash ID for the branch name, $old: all-zeros if the branch is all-new, or some valid existing commit if it's an existing branch name.

If the new commits are on a completely new branch, then:

git rev-list $new ^master ^develop ^feature/short ^feature/tall

would cover them, for instance, if we knew that the only existing branches were these four (and that there are no tags etc to worry about). But what if they're being added to, say, develop? Then we'd like to exclude the commits that are currently on develop. We could use the $old hash ID to do that:

git rev-list $new ^master ^$old ^feature/short ^feature/tall

That would again list only the new commits that whoever is running git push origin develop wants to add to our develop.

But think about $old. This is a hash ID. Where did Git get it? Git got this hash ID from the name develop. This is a pre-receive hook; the name develop has not been updated yet. So the name develop is a name for the old hash ID $old. That means:

git rev-list $new ^master ^develop ^feature/short ^feature/tall

will also do the job.

If git rev-list $new followed by "and not all existing" will do the job, then:

git rev-list $new --not --branches

will do the job. That's almost what we have here.

The bug with just using --branches is that it doesn't get any tags, or other refs. We could use --not --branches --tags but --not --all is shorter and also gets all other refs.

So this is where --not --all comes from: it depends on the special case of a pre-receive hook. We list the new hash IDs, as proposed by whoever is running a git push, that our Git has passed to us as a list of lines. We have git rev-list walk the proposed-to-be-updated commit graph, looking at the new commits in the quarantine area, but excluding all the commits that are already in our repository. The rev-list command produces these hash IDs, one per line, which we then read in a shell loop, and do whatever we like to inspect each commit.


1The quarantine area was new in Git 2.11. Prior to that, new objects could remain in the repository for a while, even if the push is rejected. The quarantine area isn't really that big a deal for most people, but for big servers like GitHub, it can save them a lot of disk space.

2The request can be forced or not-forced, and if forced, could be a force-with-lease, or not. This information is not available in the pre-receive hook (nor in the update hook), which is, um, let's just say not so great, but there are compatibility issues with adding it. It's all livable, mostly, though. The hook can tell if it's a create new ref or delete existing ref request because if so, one of the two hash IDs—old or new—will be the all-zeros "null hash" (which is reserved; no hash ID is allowed to be all-zeros).

like image 88
torek Avatar answered Oct 19 '22 22:10

torek


It means:

  • List commits that are reachable by following the parent links from the given commit(s), here $new_list, the new, modified or deleted commits
  • but exclude commits that are reachable from the one(s) given with a ^ in front of them, here "all", that is, all HEADS commits, or tagged commits.

That limits the rev-list to only the new commits received, and not all the commits (received and already present in the receiving repository)

like image 43
VonC Avatar answered Oct 19 '22 22:10

VonC