Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rfe in R's caret package giving error as : task 1 failed - "argument 1 is not a vector"

I have a training_predictors set with 56 columns, all of which are numeric. training_labels is a factor vector of 0 and 1.

I am using following list as subset sizes to be tested.

subset_sizes <- c(1:5, 10, 15, 20, 25)

Following is the list of modified rfFuncs functions.

rfRFE <- list(summary = defaultSummary, 
              fit = function(x, y, first, last, ...) {
                  library(randomForest)
                  randomForest(x, y, importance = first, ...)
              }, 
              pred = function(object, x) predict(object, x), 
              rank = function(object, x, y) {
                  vimp <- varImp(object)
                  vimp <- vimp[order(vimp$Overall, decreasing = TRUE),,drop = FALSE]
                  vimp$var <- rownames(vimp)
                  vimp
              }, 
              selectSize = pickSizeBest, 
              selectVar = pickVars)

I have declared the control function as:

rfeCtrl <- rfeControl(functions = rfRFE, 
                      method = "cv", 
                      number = 10, 
                      verbose = TRUE)

But when I run rfe function as shown below,

rfProfile <- rfe(training_predictors, 
                 training_labels, 
                 sizes = subset_sizes, 
                 rfeControl = rfeCtrl)

I am getting an error as :

Error in { : task 1 failed - "argument 1 is not a vector"

I also tried changing the vector subset_sizes, but still no luck. What am I doing wrong?

Update : I tried to run these steps one by one and the problem seems to be with the rank function. But I am still unable to figure out the problem.

Update: I found out the problem. varImp in rank function is not containing $Overall. But it contains columns with names 0 and 1. Why is it so? What does 0 and 1 signify (both column values are exactly same, by the way)? Also, how can I make varImp to return $Overall column? [as a temporary solution, I am creating a new column $Overall and attaching it to vimp in rank function.]

like image 286
exAres Avatar asked Sep 28 '22 17:09

exAres


1 Answers

Using 0 and 1 as factor levels is problematic since those are not valid R column names. In your other SO post you probably would have received a message about using these as factor levels for your output.

Try using a factor outcome with some more informative levels that can be translated into valid R column names (for class probabilities).

like image 146
topepo Avatar answered Nov 03 '22 19:11

topepo