I would like to loop through the data sets of all available (= installed) packages and find out whether these data sets have 6 or more columns. Here is my trial:
dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages
colnames(dat.list) # "Package" "LibPath" "Item" (= name of data set) "Title" (= description)
idx <- c()
i <- 3
## for(i in nrow(dat.list)) {
nme <- dat.list[[i,"Item"]] # data set as string
data(list=nme, package=dat.list[[i,"Package"]]) # load the data
## => fails with warning: In data(list = nme, package = dat.list[[i, "Package"]]) :
## data set 'BJsales.lead (BJsales)' not found
dat <- eval(as.name(nme)) # assign the data to the variable dat
ncl <- ncol(dat)
if(!is.null(ncl) && ncl >= 6) idx <- c(idx, i)
## }
It obviously
doesn't work, so I fixed an index (here: 3) to see what fails. How (if not via nme
above) can I determine the name of the data set in order to store the data set in a variable and then access its number of columns?
UPDATE Combining the posts by jeremycg and nico, I came up with this (again: not perfect in figuring out the names of the data sets but it runs through):
dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages
idx <- c()
for (i in 1:nrow(dat.list))
{
require(dat.list[i, "Package"], character.only=TRUE)
raw.name <- dat.list[i, "Item"] # data set (and parenthetical suffix) as raw string
name <- gsub('\\s.*','', raw.name) # name of data set
dat <- tryCatch(get(name), error=function(e) e) # assign the data to the variable dat (if not erroneous)
if(is(dat, "simpleError")) {
warning("Element ",i," threw an error")
dat <- NA
}
ncl <- ncol(dat)
if(!is.null(ncl) && ncl >= 6)
idx <- c(idx, i)
}
dat.list[idx, c("Package", "Item")]
I guess that you need to load the package to access the data.
So you need to add at the beginning of the loop:
require(dat.list[[i, "Package"]], character.only = TRUE)
(see this question for why you need to use the charachter.only
variable)
Note that you also need to change your loop from:
for(i in nrow(dat.list))
to
for(i in 1:nrow(dat.list))
There is another issue: some datasets are returned with the name also in parentheses. For instance:
wine.classes (wine)
So we need to strip those out. Easily done using:
dat.list[,3] <- sapply(strsplit(dat.list[,3], " "), function(x){x[1]})
Finally, dat.list
can just be accessed using []
, no need of [[]]
(easier to read!).
So, finally:
# List of all installed packages
dat.list <- data(package=.packages(all.available=TRUE))$results
# Remove package name in parentheses
dat.list[,3] <- sapply(strsplit(dat.list[, "Item"], " "),
function(x){x[1]})
idx <- c()
for (i in 1:nrow(dat.list))
{
require(dat.list[i, "Package"], character.only = T)
nme <- dat.list[i,"Item"] # data set as string
data(list=nme, package=dat.list[i,"Package"]) # load the data
dat <- eval(as.name(nme)) # assign the data to the variable dat
ncl <- ncol(dat)
if(!is.null(ncl) && ncl >= 6)
idx <- c(idx, i)
}
And:
> dat.list[idx, "Item"]
[1] "Seatbelts" "USJudgeRatings" "WorldPhones" "airquality"
[5] "anscombe" "attitude" "crimtab" "euro.cross"
[9] "infert" "longley" "mtcars" "occupationalStatus"
[13] "state.x77" "swiss" "volcano" "car.test.frame"
[17] "car90" "solder" "stagec" "bladder"
[21] "bladder1" "bladder2" "cancer" "cgd"
[25] "cgd0" "colon" "flchain" "heart"
[29] "jasa" "jasa1" "kidney" "lung"
[33] "mgus" "mgus1" "mgus2" "nwtco"
[37] "ovarian" "pbc" "pbcseq" "rats2"
[41] "transplant" "veteran" "soldat" "patch"
[45] "tooth"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With