I am trying to understand the evolution of the 100 largest repositories on GitHub. I can easily access the 100 largest repositories as of today (as measured per total number of contributors, stars, forks or LOC) using the GitHub search function or GithubArchive.org.
However, I would like to look at the 100 largest repositories at a given data in history (say, 1st of April 2011), so that I can track their growth (or decline) from that point on. How can I identify the 100 largest repositories on GitHub (as measured per stars, forks, or LOC) for a date in the past?
freeCodeCamp is arguably the biggest repository on GitHub, and it's easy to see why.
In the top right corner of GitHub.com, click your profile photo, then click Your organizations. Next to the organization, click Settings. In the left sidebar, click Deleted repositories. Next to the repository you want to restore, click Restore.
The number of public or private repo is now unlimited. Show activity on this post. Theoretically, everyone can has an unlimited number of public and private repositories even as part of a free plan. Public repositories don't have officially any restrictions even as part of a free plan.
If you own the repository, you can find the exact size by opening your Account Settings → Repositories (https://github.com/settings/repositories), and the repository size is displayed next to its designation. If you do not own the repository, you can fork it and then check the in the same place.
I think the GitHub archive project can be of help: http://www.githubarchive.org/
It stores all the public events from the GitHub timeline and exposes them for processing. The events contain info about the repositories, so you should be able to pull the data out of there to fit your use-case.
For example, I've just used the following query in the BigQuery console ( https://bigquery.cloud.google.com/?pli=1 ) to find out the number of forks of the joyent/node repository for the date 2012-03-15:
SELECT repository_forks, created_at FROM [publicdata:samples.github_timeline] WHERE (repository_url = "https://github.com/joyent/node") AND (created_at CONTAINS "2012-03-15") LIMIT 1
At here are the results:
Row forks created_at
1 1579 2012-03-15 07:49:54
Obiously, you would use the BigQuery API to do something similar (extract the data you want, fetch data for a range of dates, etc.).
And here is a query for fetching the single largest repository (by forks) for a given date:
SELECT repository_forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") ORDER BY repository_forks DESC LIMIT 1
Result:
Row forks repository_url
1 6341 https://github.com/octocat/Spoon-Knife
And here is the query to fetch the top 100 repositories by forks for a given date:
SELECT MAX(repository_forks) as forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") GROUP BY repository_url ORDER BY forks DESC LIMIT 100
Result:
Row forks repository_url
1 6341 https://github.com/octocat/Spoon-Knife
2 4452 https://github.com/twitter/bootstrap
3 3647 https://github.com/mxcl/homebrew
4 2888 https://github.com/rails/rails
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With