Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character string formula (from paste()) does not work with randomForest()

Tags:

r

I am trying to use randomForest with a formula that's been constructed through the paste() function. However, randomForest refuses to accept such a formula, while rpart does. Does anyone know how I can get this to work?

library(rpart)
library(randomForest)

# Construct a formula by pasting stuff together.
columnName <- "Species"
modelFormula <- paste(columnName, " ~ .")
print(modelFormula)
## [1] "Species  ~ ."


# Call rpart() and randomForest() with the constructed model.
model <- rpart(modelFormula, data=iris)
model <- randomForest(modelFormula, data=iris)
## Error in if (n == 0) stop("data (x) has 0 rows") : 
##   argument is of length zero

# This works if I directly include the formula.
model <- randomForest(Species ~ ., data=iris)
like image 433
stackoverflowuser2010 Avatar asked Mar 12 '26 05:03

stackoverflowuser2010


2 Answers

You need to coerce the character string to a formula object (using as.formula()) for it to work with randomForest():

R> model <- randomForest(as.formula(modelFormula), data=iris)
R> model

Call:
 randomForest(formula = as.formula(modelFormula), data = iris) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.67%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          4        46        0.08

There is a bit of a difference between the character string and to formula object

R> modelFormula
[1] "Species  ~ ."
R> as.formula(modelFormula)
Species ~ .

This is important, as there is a formula method which kicks in if you supply a formula object as the first argument. If you don't, you get the default method and that doesn't know what to do with a character string for it's argument x. You can see the method dispatch at work below:

R> methods(randomForest)
[1] randomForest.default* randomForest.formula*

   Non-visible functions are asterisked
R> debugonce(randomForest:::randomForest.formula)
R> model <- randomForest(modelFormula, data=iris) ## 1
Error in if (n == 0) stop("data (x) has 0 rows") : 
  argument is of length zero
R> model <- randomForest(as.formula(modelFormula), data=iris)
debugging in: randomForest.formula(as.formula(modelFormula), data = iris)
debug: {
.... truncated

I debugged the formula method but it doesn't get called until you pass a formula object as the first argument. Hence the error in the first call (## 1 above). With a formula object, we see that the randomForest.formula method was invoked as we drop into the debugger.

like image 72
Gavin Simpson Avatar answered Mar 13 '26 20:03

Gavin Simpson


Do:

model <- randomForest(as.formula(modelFormula), data=iris)

Result:

> model

Call:
 randomForest(formula = as.formula(modelFormula), data = iris) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          3        47        0.06
like image 43
Thomas Avatar answered Mar 13 '26 19:03

Thomas