Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Github API: Retrieve all commits for all branches for a repo

Tags:

github-api

According to the V2 documentation, you can list all commits for a branch with:

commits/list/:user_id/:repository/:branch

I am not seeing the same functionality in the V3 documentation.

I would like to collect all branches using something like:

https://api.github.com/repos/:user/:repo/branches

And then iterate through them, pulling all commits for each. Alternatively, if there's a way to pull all commits for all branches for a repo directly, that would work just as well if not better. Any ideas?

UPDATE: I tried passing the branch :sha as a param as follows:

params = {:page => 1, :per_page => 100, :sha => b}

The problem is that when i do this, it doesn't page the results properly. I feel like we're approaching this incorrectly. Any thoughts?

like image 621
adamrneary Avatar asked Feb 07 '12 16:02

adamrneary


People also ask

What is PyGithub?

PyGithub is a Python library to use the Github API v3. With it, you can manage your Github resources (repositories, user profiles, organizations, etc.) from Python scripts.

How do I fetch all GitHub repository?

How To List All Public Repositories Belonging to a User? So, to list all public repos from a user, send a GET request to https://api.github.com/users/<USER-NAME>/repos , replacing with the actual user from whom you want to retrieve the repositories.


2 Answers

I have encountered the exact same problem. I did manage to acquire all the commits for all branches within a repository (probably not that efficient due to the API).

Approach to retrieve all commits for all branches in a repository

As you mentioned, first you gather all the branches:

# https://api.github.com/repos/:user/:repo/branches
https://api.github.com/repos/twitter/bootstrap/branches

The key that you are missing is that APIv3 for getting commits operates using a reference commit (the parameter for the API call to list commits on a repository sha). So you need to make sure when you collect the branches that you also pick up their latest sha:

Trimmed result of branch API call for twitter/bootstrap

[
  {
    "commit": {
      "url": "https://api.github.com/repos/twitter/bootstrap/commits/8b19016c3bec59acb74d95a50efce70af2117382",
      "sha": "8b19016c3bec59acb74d95a50efce70af2117382"
    },
    "name": "gh-pages"
  },
  {
    "commit": {
      "url": "https://api.github.com/repos/twitter/bootstrap/commits/d335adf644b213a5ebc9cee3f37f781ad55194ef",
      "sha": "d335adf644b213a5ebc9cee3f37f781ad55194ef"
    },
    "name": "master"
  }
]

Working with last commit's sha

So as we see the two branches here have different sha, these are the latest commit sha on those branches. What you can do now is to iterate through each branch from their latest sha:

# With sha parameter of the branch's lastest sha
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=d335adf644b213a5ebc9cee3f37f781ad55194ef

So the above API call will list the last 100 commits of the master branch of twitter/bootstrap. Working with the API you have to specify the next commit's sha to get the next 100 commits. We can use the last commit's sha (which is 7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa using the current example) as input for the next API call:

# Next API call for commits (use the last commit's sha)
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa

This process is repeated until the last commit's sha is the same as the API's call sha parameter.

Next branch

That is it for one branch. Now you apply the same approach for the other branch (work from the latest sha).


There is a large issue with this approach... Since branches share some identical commits you will see the same commits over-and-over again as you move to another branch.

I can image that there is a much more efficient way to accomplish this, yet this worked for me.

like image 150
Kevin Jalbert Avatar answered Oct 10 '22 21:10

Kevin Jalbert


I asked this same question for GitHub support, and they answered me this:

GETing /repos/:owner/:repo/commits should do the trick. You can pass the branch name in the sha parameter. For example, to get the first page of commits from the '3.0.0-wip' branch of the twitter/bootstrap repository, you would use the following curl request:

curl https://api.github.com/repos/twitter/bootstrap/commits?sha=3.0.0-wip

The docs also describe how to use pagination to get the remaining commits for this branch.

As long as you are making authenticated requests, you can make up to 5,000 requests per hour.

I used the rails github-api in my app as follows(using https://github.com/peter-murach/github gem):

github_connection = Github.new :client_id => 'your_id', :client_secret => 'your_secret', :oauth_token => 'your_oath_token'
branches_info = {}
all_branches = git_connection.repos.list_branches owner,repo_name
all_branches.body.each do |branch|
    branches_info["#{branch.name}".to_s] = "#{branch.commit.url}"
end
branches_info.keys.each do |branch|
    commits_list.push (git_connection.repos.commits.list owner,repo_name, start_date,      end_date, :sha => "branch_name")
end
like image 36
Gerson Scanapieco Avatar answered Oct 10 '22 20:10

Gerson Scanapieco