Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

set random forest to classification

I am attempting a random forest on some data where the class variables is binary (either 1 or 0). Here is the code I'm running:

forest.model <- randomForest(x = ticdata2000[,1:85], y = ticdata2000[,86], 
                       ntree=500,
                       mtry=9,
                       importance=TRUE,
                       norm.votes=TRUE,
                       na.action=na.roughfix,
                       replace=FALSE,
                             )

But when the forest gets to the end, I get the following error:

Warning message:
In randomForest.default(x = ticdata2000[, 1:85], y = ticdata2000[,  :
  The response has five or fewer unique values.  Are you sure you want to do regression?

The answer, of course, is no. I don't want to do regression. I have a single, discrete variable that only takes on 2 classes. Of course, when I run predictions with this model, I get continuous numbers, when I want a list of zeroes and ones. Can someone tell me what I'm doing wrong to get this to use regression and not classification?

like image 324
Eric Avatar asked Jun 16 '13 23:06

Eric


1 Answers

Change your response column to a factor using as.factor (or just factor). Since you've stored that variable as numeric 0's and 1's, R rightly interprets it as a numeric variable. If you want R to treat it differently, you have to tell it so.

This is mentioned in the documentation under the y argument:

A response vector. If a factor, classification is assumed, otherwise regression is assumed. If omitted, randomForest will run in unsupervised mode.

like image 140
joran Avatar answered Sep 20 '22 18:09

joran