Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if there is a newer version of my local file in Github, with R

In short: I need to get the date of last change in a file hosted on Github.

In long: given that in Github I have a file (an R workspace) that once in a while is updated, I would like to create a function in R that checks if my local file is older than the one in the repo (if you're curious, my motivation is exposed at the end of this post). This is the file I'm talking about.

In principle it should be somewhat easy, since every file has a history page associated with it, but my knowledge is far too poor to know what to do with this. Also, this Q seems to hint at some way of doing what I want using php, but that's terra incognita for me really, so I don't know if it could help in any way.

So, as I said in the short version of this post, I need to find a way to retrieve the date of the last commit for this file. I can find some way to compare it to the commit date of my local file afterwards.

Thanks in advance, Juan

motivation: I'm working in a an online course in R basics which uses a system for self-checking if solutions of exercises are correct (i.e.: students can check their results instantly). This system uses a file with functions and data that is regularly updated because I often find bugs and new problems. So my goal is to have a function to tell the students if there is a newer file available. It would also be neat to find a way to download it and replace the older, but that is secondary now.

like image 516
Juan Avatar asked May 16 '13 03:05

Juan


People also ask

Does GitHub save the versions?

This is because GitHub saves these as different versions, with information about who contributed when, line-by-line. This makes collaboration easier, and it allows you to roll-back to different versions or contribute to others' work.


1 Answers

The problem is to keep the git-time of the download. The solution below sets the file time to the Git date after each download for the next check.

library(RCurl)
library(rjson)
destination = "datos" # assume current directory
repo = "https://api.github.com/repos/jumanbar/Curso-R/"
path = "ejercicios-de-programacion/rep-3/datos"
myopts = curlOptions(useragent="whatever",ssl.verifypeer=FALSE)

d = fromJSON(getURL(paste0(repo,"commits?path=",path),
                useragent="whatever",ssl.verifypeer=FALSE))[[1]]
gitDate  = as.POSIXct(d$commit$author$date)
MustDownload = !file.exists(destination) |  file.info(destination)$mtime > gitDate
if (MustDownload){
  url = d$url
  commit = fromJSON(getURL(url, .opts=myopts))
  files = unlist(lapply(commit$files,"[[","filename"))
  rawfile = commit$files[[which(files==path)]]$raw_url
  download.file(rawfile,destination,quiet=TRUE)
  Sys.setFileTime(destination,gitDate)
  print("File was downloaded")
}

It looks like from R the useragent and ssl.verifypeer is required; works without from the command line. If you are security-conscious, there is documentation on that subject floating around, but I took the easy path to commit.

like image 91
Dieter Menne Avatar answered Oct 03 '22 05:10

Dieter Menne