Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the 100 largest GitHub repositories for a past date?

I am trying to understand the evolution of the 100 largest repositories on GitHub. I can easily access the 100 largest repositories as of today (as measured per total number of contributors, stars, forks or LOC) using the GitHub search function or GithubArchive.org.

However, I would like to look at the 100 largest repositories at a given data in history (say, 1st of April 2011), so that I can track their growth (or decline) from that point on. How can I identify the 100 largest repositories on GitHub (as measured per stars, forks, or LOC) for a date in the past?

like image 953
histelheim Avatar asked Dec 06 '12 14:12

histelheim


People also ask

What is the largest GitHub repo?

freeCodeCamp is arguably the biggest repository on GitHub, and it's easy to see why.

How do I find my old GitHub repository?

In the top right corner of GitHub.com, click your profile photo, then click Your organizations. Next to the organization, click Settings. In the left sidebar, click Deleted repositories. Next to the repository you want to restore, click Restore.

Is there a limit to the number of repositories on GitHub?

The number of public or private repo is now unlimited. Show activity on this post. Theoretically, everyone can has an unlimited number of public and private repositories even as part of a free plan. Public repositories don't have officially any restrictions even as part of a free plan.

How do I find the size of my GitHub repository?

If you own the repository, you can find the exact size by opening your Account Settings → Repositories (https://github.com/settings/repositories), and the repository size is displayed next to its designation. If you do not own the repository, you can fork it and then check the in the same place.


1 Answers

I think the GitHub archive project can be of help: http://www.githubarchive.org/

It stores all the public events from the GitHub timeline and exposes them for processing. The events contain info about the repositories, so you should be able to pull the data out of there to fit your use-case.

For example, I've just used the following query in the BigQuery console ( https://bigquery.cloud.google.com/?pli=1 ) to find out the number of forks of the joyent/node repository for the date 2012-03-15:

SELECT repository_forks, created_at FROM [publicdata:samples.github_timeline] WHERE (repository_url = "https://github.com/joyent/node") AND (created_at CONTAINS "2012-03-15") LIMIT 1

At here are the results:

Row forks   created_at   
1   1579    2012-03-15 07:49:54  

Obiously, you would use the BigQuery API to do something similar (extract the data you want, fetch data for a range of dates, etc.).

And here is a query for fetching the single largest repository (by forks) for a given date:

SELECT repository_forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") ORDER BY repository_forks DESC LIMIT 1

Result:

Row forks   repository_url   
1   6341    https://github.com/octocat/Spoon-Knife   

And here is the query to fetch the top 100 repositories by forks for a given date:

SELECT MAX(repository_forks) as forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") GROUP BY repository_url ORDER BY forks DESC LIMIT 100

Result:

Row forks   repository_url   
1   6341    https://github.com/octocat/Spoon-Knife   
2   4452    https://github.com/twitter/bootstrap     
3   3647    https://github.com/mxcl/homebrew     
4   2888    https://github.com/rails/rails
...
like image 163
Ivan Zuzak Avatar answered Sep 23 '22 18:09

Ivan Zuzak