Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the oldest commit in a GitHub repository via the API

What is the most efficient way to determine when the initial commit in a GitHub repository was made? Repositories have a created_at property, but for repositories that contain imported history the oldest commit may be significantly older.

When using the command line something like this would work:

git rev-list --max-parents=0 HEAD

However I don't see an equivalent in the GitHub API.

like image 558
Mihai Parparita Avatar asked Aug 04 '14 05:08

Mihai Parparita


1 Answers

Using the GraphQL API, there is a workaround for getting the oldest commit (initial commit) in a specific branch.

First get the last commit and return the totalCount and the endCursor :

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

It returns something like that for the cursor and pageInfo object :

"totalCount": 931886,
"pageInfo": {
  "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0 but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:

<static hash> <incremented_number>

So you would just subtract 2 from totalCount (if totalCount is > 1) and get that oldest commit (or initial commit if you prefer):

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

which gives the following output (initial commit by Linus Torvalds) :

{
  "data": {
    "repository": {
      "ref": {
        "target": {
          "history": {
            "nodes": [
              {
                "message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
                "committedDate": "2005-04-16T22:20:36Z",
                "authoredDate": "2005-04-16T22:20:36Z",
                "oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
                "author": {
                  "email": "[email protected]",
                  "name": "Linus Torvalds"
                }
              }
            ],
            "totalCount": 931886,
            "pageInfo": {
              "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
            }
          }
        }
      }
    }
  }
}

A simple implementation in python to get the first commit using this method :

import requests

token = "YOUR_TOKEN"

name = "linux"
owner = "torvalds"
branch = "master"

query = """
query ($name: String!, $owner: String!, $branch: String!){
  repository(name: $name, owner: $owner) {
    ref(qualifiedName: $branch) {
      target {
        ... on Commit {
          history(first: 1, after: %s) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}
"""

def getHistory(cursor):
    r = requests.post("https://api.github.com/graphql",
        headers = {
            "Authorization": f"Bearer {token}"
        },
        json = {
            "query": query % cursor,
            "variables": {
                "name": name,
                "owner": owner,
                "branch": branch
            }
        })
    return r.json()["data"]["repository"]["ref"]["target"]["history"]

#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
    cursor = history["pageInfo"]["endCursor"].split(" ")
    cursor[1] = str(totalCount - 2)
    history = getHistory(f"\"{' '.join(cursor)}\"")
    print(history["nodes"][0])
else:
    print("got oldest commit (initial commit)")
    print(history["nodes"][0])

You can find an example in javascript on this post

like image 178
Bertrand Martel Avatar answered Oct 05 '22 23:10

Bertrand Martel