Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

available.packages by publication date

Tags:

r

cran

Is it possible to get the publication date of CRAN packages from within R? I would like to get a list of the k most recently published CRAN packages, or alternatively all packages published after date dd-mm-yy. Similar to the information on the available_packages_by_date.html?

The available.packages() command has a "fields" argument, but this only extracts fields from the DESCRIPTION. The date field on the package description is not always up-to-date.

I can get it with a smart regex from the html page, but I am not sure how reliable and up-to-date the this html file is... At some point Kurt might decide to give the layout a makeover which would break the script. An alternative is to use timestamps from the CRAN FTP but I am also not sure how good this solution is. I am not sure if there is somewhere a formally structured file with publication dates? I assume the HTML page is automatically generated from some DB.

like image 740
Jeroen Ooms Avatar asked Jan 04 '12 05:01

Jeroen Ooms


2 Answers

Turns out there is an undocmented file "packages.rds" which contains the publication dates (not times) of all packages. I suppose these data are used to recreate the HTML file every day.

Below a simple function that extracts publication dates from this file:

recent.packages.rds <- function(){
    mytemp <- tempfile();
    download.file("http://cran.r-project.org/web/packages/packages.rds", mytemp);
    mydata <- as.data.frame(readRDS(mytemp), row.names=NA);
    mydata$Published <- as.Date(mydata[["Published"]]);

    #sort and get the fields you like:
    mydata <- mydata[order(mydata$Published),c("Package", "Version", "Published")];
}
like image 121
Jeroen Ooms Avatar answered Oct 11 '22 10:10

Jeroen Ooms


The best approach is to take advantage of the fact the package DESCRIPTION is published on the cran mirror, and since the DESCRIPTION is from the build package, it contains information about exactly when it was packaged:

pkgs <- unname(available.packages()[, 1])[1:20]
desc_urls <- paste("http://cran.r-project.org/web/packages/", pkgs, "/DESCRIPTION", sep = "")
desc <- lapply(desc_urls, function(x) read.dcf(url(x)))

sapply(desc, function(x) x[, "Packaged"])
sapply(desc, function(x) x[, "Date/Publication"])

(I'm restricting it to the first 20 packages here to illustrate the basic idea)

like image 44
hadley Avatar answered Oct 11 '22 08:10

hadley