Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ROC curve for classification from randomForest

I am using randomForest package in R platform for classification task.

rf_object<-randomForest(data_matrix, label_factor, cutoff=c(k,1-k))

where k ranges from 0.1 to 0.9.

pred <- predict(rf_object,test_data_matrix)

I have the output from the random forest classifier and I compared it with the labels. So, I have the performance measures like accuracy, MCC, sensitivity, specificity, etc for 9 cutoff points.

Now, I want to plot the ROC curve and obtain the area under the ROC curve to see how good the performance is. Most of the packages in R (like ROCR, pROC) require prediction and labels but I have sensitivity (TPR) and specificity (1-FPR).

Can any one suggest me if the cutoff method is correct or reliable to produce ROC curve? Do you know any way to obtain ROC curve and area under the curve using TPR and FPR?

I also tried to use the following command to train random forest. This way the predictions were continuous and were acceptable to ROCR and pROC packages in R. But, I am not sure if this is correct way to do. Can any one suggest me about this method?

rf_object <- randomForest(data_matrix, label_vector)
pred <- predict(rf_object, test_data_matrix)

Thank you for your time reading my problem! I have spent long time surfing for this. Thank you for your suggestion/advice.

like image 877
Abhishek Avatar asked Sep 11 '12 13:09

Abhishek


1 Answers

Why don't you output class probabilities ? This way, you have a ranking of your predictions and you can directly input that to any ROC package.

m = randomForest(data_matrix, labels)
predict(m,newdata_matrix,type='prob')

Note that, to use randomForest as a classification tool, labels must be a vector of factor.

like image 100
jey1401 Avatar answered Oct 04 '22 01:10

jey1401