Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

available CRAN vignettes

There's the available.packages() function to list all packages available on CRAN. Is there a similar function to find all available vignettes? If not how would I get a list of all vignettes and the packages they're associated with?

As a corner case to keep in mind the data.table package has 3 vignettes associated with it.

EDIT: Per Andrie's response I realize I wasn't clear. I know about the vignette function for finding all the available local vignettes, I'm after a way to get all the vignettes of all packages on CRAN.

like image 420
Tyler Rinker Avatar asked May 31 '12 00:05

Tyler Rinker


1 Answers

I seem to recall looking at this in response to some SO question (can't find it now) and deciding that since the information isn't included in the output of available.packages(), nor in the result of applying readRDS to @CRAN/web/packages/packages.rds (a trick from Jeroen Ooms), I couldn't think of a non-scraping way to do it ...

Here's my scraper, applied to the first 100 packages (leading to 44 vignettes)

pkgs <- unname(available.packages()[, 1])[1:100]
vindex_urls <- paste0(getOption("repos"),"/web/packages/", pkgs, 
    "/vignettes/index.rds", sep = "")
getf <- function(x) {
      ## I think there should be a way to do this directly
      ## with readRDS(url(...)) but I can't get it to work
    suppressWarnings(
              download.file(x,"tmp.rds",quiet=TRUE))
    readRDS("tmp.rds")
}
library(plyr)
vv <- ldply(vindex_urls,
            .progress="text",
            function(x) {
                if (inherits(z <- try(getf(x),silent=TRUE),
                    "try-error")) NULL else z
            })
tmpf <- function(x,n) { if (is.null(x)) NULL else
                            data.frame(pkg=n,x) }
vframe <- do.call(rbind,mapply(tmpf,vv,pkgs))
rownames(vframe) <- NULL
head(vframe[,c("pkg","Title")])

There may be ways to clean this up/make it more compact, but it seems to work OK. Your scrape once/update occasionally strategy seems reasonable. Or if you wanted you could scrape daily (or weekly or whatever seems reasonable) and save/post the results somewhere publicly accessible, then include a function with that URL hard-coded in the package ... or even create a nicely formatted HTML table, with links, that the whole world could use (and then add Viagra ads to the page, and $$PROFIT$$ ...)

edit: wrapped both the download and the readRDS in a function, so I can wrap the whole thing in try

like image 68
Ben Bolker Avatar answered Sep 28 '22 08:09

Ben Bolker