Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery: When is GHTorrent refreshed and how to get up to date information?

The ghtorrent-bq data is great to have snapshot of GitHub, however, it is not clear when it is updated and how I could get more up to date data

like image 281
Steren Avatar asked Mar 21 '17 18:03

Steren


People also ask

What is System_time A of used for in BigQuery?

Query data at a point in time You can query a table's historical data from any point in time within the time travel window by using a FOR SYSTEM_TIME AS OF clause. This clause takes a constant timestamp expression and references the version of the table that was current at that timestamp.


1 Answers

Theoretically, it is updated every time a new GHTorrent MySQL dump has been released. Practically, there are still manual adjustments that need to be done to the generated CSVs as there is lots of weird text in fields such as user locations that CSV parsers fail to handle.

http://ghtorrent.org/gcloud.html

like image 112
Georgios Gousios Avatar answered Nov 15 '22 02:11

Georgios Gousios