Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to loop over all data sets (and determine their number of columns)?

Tags:

r

I would like to loop through the data sets of all available (= installed) packages and find out whether these data sets have 6 or more columns. Here is my trial:

dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages
colnames(dat.list) # "Package" "LibPath" "Item" (= name of data set) "Title" (= description)
idx <- c()
i <- 3
## for(i in nrow(dat.list)) {
    nme <- dat.list[[i,"Item"]] # data set as string
    data(list=nme, package=dat.list[[i,"Package"]]) # load the data
    ## => fails with warning: In data(list = nme, package = dat.list[[i, "Package"]]) :
    ##    data set 'BJsales.lead (BJsales)' not found
    dat <- eval(as.name(nme)) # assign the data to the variable dat
    ncl <- ncol(dat)
    if(!is.null(ncl) && ncl >= 6) idx <- c(idx, i)
## }

It obviously doesn't work, so I fixed an index (here: 3) to see what fails. How (if not via nme above) can I determine the name of the data set in order to store the data set in a variable and then access its number of columns?

UPDATE Combining the posts by jeremycg and nico, I came up with this (again: not perfect in figuring out the names of the data sets but it runs through):

dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages
idx <- c()
for (i in 1:nrow(dat.list))
{
    require(dat.list[i, "Package"], character.only=TRUE)
    raw.name <- dat.list[i, "Item"] # data set (and parenthetical suffix) as raw string
    name <- gsub('\\s.*','', raw.name) # name of data set
    dat <- tryCatch(get(name), error=function(e) e) # assign the data to the variable dat (if not erroneous)
    if(is(dat, "simpleError")) {
        warning("Element ",i," threw an error")
        dat <- NA
    }
    ncl <- ncol(dat)
    if(!is.null(ncl) && ncl >= 6)
        idx <- c(idx, i)
}
dat.list[idx, c("Package", "Item")]
like image 868
Marius Hofert Avatar asked Aug 17 '15 16:08

Marius Hofert


1 Answers

I guess that you need to load the package to access the data.

So you need to add at the beginning of the loop:

require(dat.list[[i, "Package"]], character.only = TRUE)

(see this question for why you need to use the charachter.only variable)

Note that you also need to change your loop from:

for(i in nrow(dat.list))

to

for(i in 1:nrow(dat.list))

There is another issue: some datasets are returned with the name also in parentheses. For instance:

wine.classes (wine)

So we need to strip those out. Easily done using:

dat.list[,3] <- sapply(strsplit(dat.list[,3], " "), function(x){x[1]})

Finally, dat.list can just be accessed using [], no need of [[]] (easier to read!).

So, finally:

# List of all installed packages
dat.list <- data(package=.packages(all.available=TRUE))$results

# Remove package name in parentheses
dat.list[,3] <- sapply(strsplit(dat.list[, "Item"], " "), 
      function(x){x[1]})

idx <- c()
for (i in 1:nrow(dat.list)) 
    {
    require(dat.list[i, "Package"], character.only = T)
    nme <- dat.list[i,"Item"] # data set as string
    data(list=nme, package=dat.list[i,"Package"]) # load the data

    dat <- eval(as.name(nme)) # assign the data to the variable dat
    ncl <- ncol(dat)
    if(!is.null(ncl) && ncl >= 6)
        idx <- c(idx, i)
    }

And:

> dat.list[idx, "Item"]
 [1] "Seatbelts"          "USJudgeRatings"     "WorldPhones"        "airquality"        
 [5] "anscombe"           "attitude"           "crimtab"            "euro.cross"        
 [9] "infert"             "longley"            "mtcars"             "occupationalStatus"
[13] "state.x77"          "swiss"              "volcano"            "car.test.frame"    
[17] "car90"              "solder"             "stagec"             "bladder"           
[21] "bladder1"           "bladder2"           "cancer"             "cgd"               
[25] "cgd0"               "colon"              "flchain"            "heart"             
[29] "jasa"               "jasa1"              "kidney"             "lung"              
[33] "mgus"               "mgus1"              "mgus2"              "nwtco"             
[37] "ovarian"            "pbc"                "pbcseq"             "rats2"             
[41] "transplant"         "veteran"            "soldat"             "patch"             
[45] "tooth"             
like image 177
nico Avatar answered Sep 22 '22 03:09

nico