What is the most efficient way to determine when the initial commit in a GitHub repository was made? Repositories have a created_at
property, but for repositories that contain imported history the oldest commit may be significantly older.
When using the command line something like this would work:
git rev-list --max-parents=0 HEAD
However I don't see an equivalent in the GitHub API.
Using the GraphQL API, there is a workaround for getting the oldest commit (initial commit) in a specific branch.
First get the last commit and return the totalCount
and the endCursor
:
{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
It returns something like that for the cursor and pageInfo
object :
"totalCount": 931886,
"pageInfo": {
"endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}
I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0
but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:
<static hash> <incremented_number>
So you would just subtract 2 from totalCount
(if totalCount
is > 1) and get that oldest commit (or initial commit if you prefer):
{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
which gives the following output (initial commit by Linus Torvalds) :
{
"data": {
"repository": {
"ref": {
"target": {
"history": {
"nodes": [
{
"message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
"committedDate": "2005-04-16T22:20:36Z",
"authoredDate": "2005-04-16T22:20:36Z",
"oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
"author": {
"email": "[email protected]",
"name": "Linus Torvalds"
}
}
],
"totalCount": 931886,
"pageInfo": {
"endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
}
}
}
}
}
}
}
A simple implementation in python to get the first commit using this method :
import requests
token = "YOUR_TOKEN"
name = "linux"
owner = "torvalds"
branch = "master"
query = """
query ($name: String!, $owner: String!, $branch: String!){
repository(name: $name, owner: $owner) {
ref(qualifiedName: $branch) {
target {
... on Commit {
history(first: 1, after: %s) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
"""
def getHistory(cursor):
r = requests.post("https://api.github.com/graphql",
headers = {
"Authorization": f"Bearer {token}"
},
json = {
"query": query % cursor,
"variables": {
"name": name,
"owner": owner,
"branch": branch
}
})
return r.json()["data"]["repository"]["ref"]["target"]["history"]
#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
cursor = history["pageInfo"]["endCursor"].split(" ")
cursor[1] = str(totalCount - 2)
history = getHistory(f"\"{' '.join(cursor)}\"")
print(history["nodes"][0])
else:
print("got oldest commit (initial commit)")
print(history["nodes"][0])
You can find an example in javascript on this post
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With