I am attempting a random forest on some data where the class variables is binary (either 1 or 0). Here is the code I'm running:
forest.model <- randomForest(x = ticdata2000[,1:85], y = ticdata2000[,86],
ntree=500,
mtry=9,
importance=TRUE,
norm.votes=TRUE,
na.action=na.roughfix,
replace=FALSE,
)
But when the forest gets to the end, I get the following error:
Warning message:
In randomForest.default(x = ticdata2000[, 1:85], y = ticdata2000[, :
The response has five or fewer unique values. Are you sure you want to do regression?
The answer, of course, is no. I don't want to do regression. I have a single, discrete variable that only takes on 2 classes. Of course, when I run predictions with this model, I get continuous numbers, when I want a list of zeroes and ones. Can someone tell me what I'm doing wrong to get this to use regression and not classification?
Change your response column to a factor using as.factor
(or just factor
). Since you've stored that variable as numeric 0's and 1's, R rightly interprets it as a numeric variable. If you want R to treat it differently, you have to tell it so.
This is mentioned in the documentation under the y
argument:
A response vector. If a factor, classification is assumed, otherwise regression is assumed. If omitted, randomForest will run in unsupervised mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With