Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ROC curve in R using ROCR package

Tags:

Can someone explain me please how to plot a ROC curve with ROCR. I know that I should first run:

prediction(predictions, labels, label.ordering = NULL) 

and then:

performance(prediction.obj, measure, x.measure="cutoff", ...) 

I am just not clear what is meant with prediction and labels. I created a model with ctree and cforest and I want the ROC curve for both of them to compare it in the end. In my case the class attribute is y_n, which I suppose should be used for the labels. But what about the predictions? Here are the steps of what I do (dataset name= bank_part):

pred<-cforest(y_n~.,bank_part) tablebank<-table(predict(pred),bank_part$y_n) prediction(tablebank, bank_part$y_n) 

After running the last line I get this error:

Error in prediction(tablebank, bank_part$y_n) :  Number of cross-validation runs must be equal for predictions and labels. 

Thanks in advance!

Here's another example: I have the training dataset(bank_training) and testing dataset(bank_testing) and I ran a randomForest as below:

bankrf<-randomForest(y~., bank_training, mtry=4, ntree=2,     keep.forest=TRUE,importance=TRUE)  bankrf.pred<-predict(bankrf, bank_testing, type='response') 

Now the bankrf.pred is a factor object with labels c=("0", "1"). Still, I don't know how to plot ROC, cause I get stuck to the prediction part. Here's what I do

library(ROCR)  pred<-prediction(bankrf.pred$y, bank_testing$c(0,1)  

But this is still incorrect, cause I get the error message

Error in bankrf.pred$y_n : $ operator is invalid for atomic vectors 
like image 541
spektra Avatar asked Jul 13 '12 09:07

spektra


People also ask

What package is ROC curve in R?

The basic unit of the pROC package is the roc function. It will build a ROC curve, smooth it if requested (if smooth=TRUE), compute the AUC (if auc=TRUE), the confidence interval (CI) if requested (if ci=TRUE) and plot the curve if requested (if plot=TRUE).

What are Rocr charts?

ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface).


2 Answers

The predictions are your continuous predictions of the classification, the labels are the binary truth for each variable.

So something like the following should work:

> pred <- prediction(c(0.1,.5,.3,.8,.9,.4,.9,.5), c(0,0,0,1,1,1,1,1)) > perf <- performance(pred, "tpr", "fpr") > plot(perf) 

to generate an ROC.

EDIT: It may be helpful for you to include the sample reproducible code in the question (I'm having a hard time intepreting your comment).

There's no new code here, but... here's a function I use quite often for plotting an ROC:

 plotROC <- function(truth, predicted, ...){    pred <- prediction(abs(predicted), truth)        perf <- performance(pred,"tpr","fpr")     plot(perf, ...) } 
like image 140
Jeff Allen Avatar answered Sep 21 '22 03:09

Jeff Allen


Like @Jeff said, your predictions need to be continuous for ROCR's prediction function. require(randomForest); ?predict.randomForest shows that, by default, predict.randomForest returns a prediction on the original scale (class labels, in classification), whereas predict.randomForest(..., type = 'prob') returns probabilities of each class. So:

require(ROCR) data(iris) iris$setosa <- factor(1*(iris$Species == 'setosa')) iris.rf <- randomForest(setosa ~ ., data=iris[,-5]) summary(predict(iris.rf, iris[,-5])) summary(iris.preds <- predict(iris.rf, iris[,-5], type = 'prob')) preds <- iris.preds[,2] plot(performance(prediction(preds, iris$setosa), 'tpr', 'fpr')) 

gives you what you want. Different classification packages require different commands for getting predicted probabilities -- sometimes it's predict(..., type='probs'), predict(..., type='prob')[,2], etc., so just check out the help files for each function you're calling.

like image 22
lockedoff Avatar answered Sep 20 '22 03:09

lockedoff