Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the git command used by GitHub PR to show differences

Tags:

git

github

I would like to use the command line to generate the same difference presented by GitHub. The help page describes it as “a comparison between the tip of the head branch and the commit at which the head was last synced with the base branch” (https://help.github.com/articles/about-comparing-branches-in-pull-requests)

If the PR is to pull branch a into branch b, what is the git diff command line?

In particular I am wondering why when both a and b were created off branch m and I merge a subsequent change made on branch m to both a and b, the change appears in the PR.

Example: commit graph for branches m, a, and b (read bottom to top)

b m a
4   5 merge branch m at commit 3 to branch a and b
|\ /| 
| 3 | commit on branch m
| | 2 create branch a off m at commit 0 and commit
1 | | create branch b off m at commit 0 and commit
 \|/
  0   start with branch m at commit 0

Head of branch to merged:

git rev-parse a
5

Base of merge:

git merge-base b a
3

git command shows differences introduced in commit 2:

git diff b...a

But GitHub PR shows differences introduced in commits 2 and 3

like image 352
Ken Taylor Avatar asked Sep 11 '25 19:09

Ken Taylor


2 Answers

Edit: I've augmented this with actual output from local commands and GitHub, and now I can't explain what shows up in their pull request comparisons at all. It's utter nonsense, much like a lot of other stuff that they show on their web pages. The GitHub display shows commits that are already present in the target branch, as if the merge is going to make changes that it won't actually make. That is, for "merge branch 'a' into b", the only commits that will be added are those that are not already contained in 'b', yet GitHub's display shows two commits that are already contained in 'b'.


TL;DR

Usually (but see the long answer):

git diff $base...$head

where $base and $head are from the long answer below. Edit: Well, that's what they should show. It turns out that's not what they actually show.

Using your example repository

You set up a repository at https://github.com/kenocrates/ex1, which I cloned:

$ git clone https://github.com/kenocrates/ex1
...
$ cd ex1

To obtain the refs/pull references from them you then modified your configuration so that it reads, in part:

[remote "origin"]
        url = https://github.com/kenocrates/ex1
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = +refs/pull/*/head:refs/remotes/origin/pr/*

which I also did. Then:

$ git fetch origin
From https://github.com/kenocrates/ex1
 * [new ref]         refs/pull/1/head -> origin/pr/1

The command I use now, to find the correct merge base or bases, is:

$ git merge-base --all refs/remotes/origin/pr/1 refs/remotes/origin/b
587593749ee46806ed2c9fd06cf8b904bbce255a

since my full name for the commit that would be merged by merging the pull request is refs/remotes/origin/pr/1, and my full name for what GitHub is calling the "base branch" is refs/remotes/origin/b. Note that we can also use the raw hash IDs, or shorter names:

$ git rev-parse origin/pr/1
1cef243f9efe6e94c9926f7992efb6c093188b8c
$ git rev-parse origin/b
48728bc19480e0c1cc9e3a399634a5f389881c47

(leaving out refs/remotes/, which—as described in the gitrevisions documentation, will be assumed by the fifth step in the six-step process Git normally uses to resolve a name).

$ git merge-base --all 1cef243f9efe6e94c9926f7992efb6c093188b8c 48728bc19480e0c1cc9e3a399634a5f389881c47
587593749ee46806ed2c9fd06cf8b904bbce255a

Hence a correct diff (full, not one-commit-at-a-time) is available via:

$ git diff 587593749ee46806ed2c9fd06cf8b904bbce255a origin/b

or the shorter:

$ git diff origin/pr/1...origin/b

either of which outputs:

diff --git a/file b/file
index 4fc2681..186222c 100644
--- a/file
+++ b/file
@@ -1,6 +1,7 @@
 Section A

 Section B
+line 1

 Section C
 line 1

This is what GitHub should show as it is what would be merged, and the effect it would have, if we were to accept the pull request. What they actually show is different. Per https://github.com/kenocrates/ex1/pull/1/commits, what we see is four commits, two of which are already contained within the target branch:

1cef243f9efe6e94c9926f7992efb6c093188b8c
587593749ee46806ed2c9fd06cf8b904bbce255a
23c2ff68c02207a2f172090566d7b2c75b6f1c16
b21d3c4067261aa295319f177ad1629b5ae12818

Here's what's actually in the repository on GitHub including the pull request (though not its merge), with the usual name-changes resulting from the fact that we're using our copies of the repositories and we have not set ours up as pure mirrors. (Making a pure mirror would make our own local repository useless for doing any work in.)

$ git log --all --decorate --oneline --graph
*   48728bc (origin/b) Merge branch 'm' into b
|\  
* | f365142 added line to section B
| | *   1cef243 (HEAD -> a, origin/pr/1, origin/a, origin/HEAD) Merge branch 'm' into a
| | |\  
| | |/  
| |/|   
| * |   5875937 (origin/m) Merge branch 'p' into m
| |\ \  
|/ / /  
| * | 23c2ff6 added line to section C
|/ /  
| * b21d3c4 added line to section A
|/  
* 1ff1a3c added file

This graph is pretty tangled and difficult to read, but, if we were to actually do this proposed merge, these are the commits that we'd add via the other leg, that are not already on origin/a:

$ git log --decorate --oneline --graph $(git merge-base origin/pr/1 origin/b)..origin/b
* 48728bc (origin/b) Merge branch 'm' into b
* f365142 added line to section B

which is why we see what we see in the local diff. For some reason, the base GitHub is choosing is not correct.

Let's try one more experiment—let's grab the actual merge commit that GitHub made, and see how it looks in git log --all --decorate --graph --oneline:

$ git config --add remote.origin.fetch '+refs/pull/*/merge:refs/remotes/origin/pr-merge/*'
$ git fetch
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/kenocrates/ex1
 * [new ref]         refs/pull/1/merge -> origin/pr-merge/1

$ git log --all --decorate --graph --oneline
*   f53dc81 (origin/pr-merge/1) Merge 1cef243f9efe6e94c9926f7992efb6c093188b8c into 48728bc19480e0c1cc9e3a399634a5f389881c47
|\  
| *   1cef243 (HEAD -> a, origin/pr/1, origin/a, origin/HEAD) Merge branch 'm' into a
| |\  
| * | b21d3c4 added line to section A
* | |   48728bc (origin/b) Merge branch 'm' into b
|\ \ \  
| | |/  
| |/|   
| * |   5875937 (origin/m) Merge branch 'p' into m
| |\ \  
| | |/  
| |/|   
| | * 23c2ff6 added line to section C
| |/  
* | f365142 added line to section B
|/  
* 1ff1a3c added file

This is just what we should expect: a merge commit whose first parent origin/pr-merge/1^1 is 48728bc19480e0c1cc9e3a399634a5f389881c47 aka origin/b, and whose second parent origin/pr-merge/1^2 is 1cef243f9efe6e94c9926f7992efb6c093188b8c aka HEAD, a, origin/pr/1, origin/a, and origin/HEAD.

Long

The answer [edit: except it's not the answer GitHub actually uses] is embedded in the phrase:

“a comparison between the tip of the head branch and the commit at which the head was last synced with the base branch”

and to comprehend (and reproduce) this we have to define each term.

The tip of a branch (name) is simply the commit to which the branch name resolves. For instance, to find the hash ID of the tip of branch master you could run:

git rev-parse master

In general, for most commands, using the branch name has the same effect as using this branch-tip hash ID, so we don't even have to bother with git rev-parse here. (There are some exceptions to this rule in Git: sometimes a name means more than just a raw hash ID.)

Next we have the head branch and the base branch. Here, it's GitHub, not Git, defining these terms. The head branch in this case is the branch on which you're doing the pull request, and the base is the one you're saying "into": please pull feature-X into master means head = feature-X and base = master; so

If the PR is to pull branch a into branch b ...

then "head" is a and "base" is b, so you could do head=a and base=b and use $head and $base below. (For paranoida, fetch the appropriate reference and use refs/heads/$N/head or its hash ID, as noted below.)

Last, we have the phrase the commit at which ... was last synced .... This phrase is partly defined by GitHub, since Git doesn't use that phrasing, but it really means the (ideally single) commit that is the merge base of these two commits. The merge base is defined via the commit graph, so you need enough of the graph to find it. The entire graph is always enough, so if you have the pull-request commit—whose hash is stored in a reference named refs/pull/N/head—and the base branch hash, stored in refs/heads/base—running:

git merge-base --all refs/pull/$N/head refs/heads/$base

or more simply:

git merge-base --all $head $base

will produce the hash(es) of the merge base(s).

(If there is more than one, the actual merge base in Git will be created by merging the merge bases, at the time you run git merge -s recursive. Note that -s recursive is the default strategy. Whether GitHub also does this, under what conditions, I do not know. Note that relying on $base to resolve to refs/heads/$base is reasonably reliable, but misfires if you use the same name for a tag and a branch!)

The diff, therefore, is from the merge base—the hash output by the above git merge-base command—to what GitHub are calling the "head branch". You could run the git merge-base command and verify that it produces exactly one hash:

hash=$(git merge-base $head $base)  # slightly sloppy
... check that $hash is just one hash ...
git diff $hash $head

or rely on the special git diff syntax using three dots: git diff A...B means:1 find the merge base(s) between A and B; pick one at random, and run git diff from that one to B.

This is where the short version above comes from. As noted above, if you have a branch and tag that use the same name, $base or $head may resolve incorrectly to the tag instead of the branch, so if you're paranoid, spell out the full references.


1There's a long-standing but minor bug in git diff with three-dot merge syntax, when there are multiple merge bases. Multiple merge bases are pretty rare and you may not care.

like image 61
torek Avatar answered Sep 13 '25 11:09

torek


Github now has an official CLI tool that offers exactly this. Just run gh pr diff [PrNumber].

More info at https://cli.github.com/manual/gh_pr_diff.

like image 44
Moshe Weitzman Avatar answered Sep 13 '25 13:09

Moshe Weitzman