Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of randomforest() for classification in R?

I originally had a data frame composed of 12 columns in N rows. The last column is my class (0 or 1). I had to convert my entire data frame to numeric with

training <- sapply(training.temp,as.numeric)

But then I thought I needed the class column to be a factor column to use the randomforest() tool as a classifier, so I did

training[,"Class"] <- factor(training[,ncol(training)])

I proceed to creating the tree with

training_rf <- randomForest(Class ~., data = trainData, importance = TRUE, do.trace = 100)

But I'm getting two errors:

1: In Ops.factor(training[, "Status"], factor(training[, ncol(training)])) : 
<= this is not relevant for factors (roughly translated)
2: In randomForest.default(m, y, ...) :
The response has five or fewer unique values.  Are you sure you want to do regression?

I would appreciate it if someone could point out the formatting mistake I'm making.

Thanks!

like image 948
marc Avatar asked Oct 10 '13 16:10

marc


People also ask

Can random forest Regressor be used for classification?

No, they are not both valid. Whether you use a classifier or a regressor only depends on the kind of problem you are solving.

Is random forest used for classification?

Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression problems.


2 Answers

So the issue is actually quite simple. It turns out my training data was an atomic vector. So it first had to be converted as a data frame. So I needed to add the following line:

training <- as.data.frame(training)

Problem solved!

like image 77
marc Avatar answered Oct 11 '22 08:10

marc


First, your coercion to a factor is not working because of syntax errors. Second, you should always use indexing when specifying a RF model. Here are changes in your code that should make it work.

    training <- sapply(training.temp,as.numeric)
      training[,"Class"] <- as.factor(training[,"Class"])

    training_rf <- randomForest(x=training[,1:(ncol(training)-1)], y=training[,"Class"], 
                                importance=TRUE, do.trace=100)

# You can also coerce to a factor directly in the model statement
    training_rf <- randomForest(x=training[,1:(ncol(training)-1)], y=as.factor(training[,"Class"]), 
                                importance=TRUE, do.trace=100)
like image 42
Jeffrey Evans Avatar answered Oct 11 '22 08:10

Jeffrey Evans