Can someone explain me please how to plot a ROC curve with ROCR. I know that I should first run:
prediction(predictions, labels, label.ordering = NULL)
and then:
performance(prediction.obj, measure, x.measure="cutoff", ...)
I am just not clear what is meant with prediction and labels. I created a model with ctree and cforest and I want the ROC curve for both of them to compare it in the end. In my case the class attribute is y_n, which I suppose should be used for the labels. But what about the predictions? Here are the steps of what I do (dataset name= bank_part):
pred<-cforest(y_n~.,bank_part) tablebank<-table(predict(pred),bank_part$y_n) prediction(tablebank, bank_part$y_n)
After running the last line I get this error:
Error in prediction(tablebank, bank_part$y_n) : Number of cross-validation runs must be equal for predictions and labels.
Thanks in advance!
Here's another example: I have the training dataset(bank_training) and testing dataset(bank_testing) and I ran a randomForest as below:
bankrf<-randomForest(y~., bank_training, mtry=4, ntree=2, keep.forest=TRUE,importance=TRUE) bankrf.pred<-predict(bankrf, bank_testing, type='response')
Now the bankrf.pred is a factor object with labels c=("0", "1"). Still, I don't know how to plot ROC, cause I get stuck to the prediction part. Here's what I do
library(ROCR) pred<-prediction(bankrf.pred$y, bank_testing$c(0,1)
But this is still incorrect, cause I get the error message
Error in bankrf.pred$y_n : $ operator is invalid for atomic vectors
The basic unit of the pROC package is the roc function. It will build a ROC curve, smooth it if requested (if smooth=TRUE), compute the AUC (if auc=TRUE), the confidence interval (CI) if requested (if ci=TRUE) and plot the curve if requested (if plot=TRUE).
ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface).
The predictions are your continuous predictions of the classification, the labels are the binary truth for each variable.
So something like the following should work:
> pred <- prediction(c(0.1,.5,.3,.8,.9,.4,.9,.5), c(0,0,0,1,1,1,1,1)) > perf <- performance(pred, "tpr", "fpr") > plot(perf)
to generate an ROC.
EDIT: It may be helpful for you to include the sample reproducible code in the question (I'm having a hard time intepreting your comment).
There's no new code here, but... here's a function I use quite often for plotting an ROC:
plotROC <- function(truth, predicted, ...){ pred <- prediction(abs(predicted), truth) perf <- performance(pred,"tpr","fpr") plot(perf, ...) }
Like @Jeff said, your predictions need to be continuous for ROCR
's prediction
function. require(randomForest); ?predict.randomForest
shows that, by default, predict.randomForest
returns a prediction on the original scale (class labels, in classification), whereas predict.randomForest(..., type = 'prob')
returns probabilities of each class. So:
require(ROCR) data(iris) iris$setosa <- factor(1*(iris$Species == 'setosa')) iris.rf <- randomForest(setosa ~ ., data=iris[,-5]) summary(predict(iris.rf, iris[,-5])) summary(iris.preds <- predict(iris.rf, iris[,-5], type = 'prob')) preds <- iris.preds[,2] plot(performance(prediction(preds, iris$setosa), 'tpr', 'fpr'))
gives you what you want. Different classification packages require different commands for getting predicted probabilities -- sometimes it's predict(..., type='probs')
, predict(..., type='prob')[,2]
, etc., so just check out the help files for each function you're calling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With