Can someone explain me please how to plot a ROC curve with ROCR. I know that I should first run: <pre class="prettyprint"><code>prediction(predictions, labels, label.ordering = NULL) </code></pre> and then: <pre class="prettyprint"><code>performance(prediction.obj, measure, x.measure="cutoff", ...) </code></pre> I am just not clear what is meant with prediction and labels. I created a model with ctree and cforest and I want the ROC curve for both of them to compare it in the end. In my case the class attribute is y_n, which I suppose should be used for the labels. But what about the predictions? Here are the steps of what I do (dataset name= bank_part): <pre class="prettyprint"><code>pred<-cforest(y_n~.,bank_part) tablebank<-table(predict(pred),bank_part$y_n) prediction(tablebank, bank_part$y_n) </code></pre> After running the last line I get this error: <pre class="prettyprint lang-none prettyprint-override"><code>Error in prediction(tablebank, bank_part$y_n) : Number of cross-validation runs must be equal for predictions and labels. </code></pre> Thanks in advance! Here's another example: I have the training dataset(bank_training) and testing dataset(bank_testing) and I ran a randomForest as below: <pre class="prettyprint"><code>bankrf<-randomForest(y~., bank_training, mtry=4, ntree=2, keep.forest=TRUE,importance=TRUE) bankrf.pred<-predict(bankrf, bank_testing, type='response') </code></pre> Now the bankrf.pred is a factor object with labels c=("0", "1"). Still, I don't know how to plot ROC, cause I get stuck to the prediction part. Here's what I do <pre class="prettyprint"><code>library(ROCR) pred<-prediction(bankrf.pred$y, bank_testing$c(0,1) </code></pre> But this is still incorrect, cause I get the error message <pre class="prettyprint lang-none prettyprint-override"><code>Error in bankrf.pred$y_n : $ operator is invalid for atomic vectors </code></pre>

Like @Jeff said, your predictions need to be continuous for <code>ROCR</code>'s <code>prediction</code> function. <code>require(randomForest); ?predict.randomForest</code> shows that, by default, <code>predict.randomForest</code> returns a prediction on the original scale (class labels, in classification), whereas <code>predict.randomForest(..., type = 'prob')</code> returns probabilities of each class. So: <pre class="prettyprint"><code>require(ROCR) data(iris) iris$setosa <- factor(1*(iris$Species == 'setosa')) iris.rf <- randomForest(setosa ~ ., data=iris[,-5]) summary(predict(iris.rf, iris[,-5])) summary(iris.preds <- predict(iris.rf, iris[,-5], type = 'prob')) preds <- iris.preds[,2] plot(performance(prediction(preds, iris$setosa), 'tpr', 'fpr')) </code></pre> gives you what you want. Different classification packages require different commands for getting predicted probabilities -- sometimes it's <code>predict(..., type='probs')</code>, <code>predict(..., type='prob')[,2]</code>, etc., so just check out the help files for each function you're calling.

ROC curve in R using ROCR package

Tags:

Can someone explain me please how to plot a ROC curve with ROCR. I know that I should first run:

prediction(predictions, labels, label.ordering = NULL)

and then:

performance(prediction.obj, measure, x.measure="cutoff", ...)

I am just not clear what is meant with prediction and labels. I created a model with ctree and cforest and I want the ROC curve for both of them to compare it in the end. In my case the class attribute is y_n, which I suppose should be used for the labels. But what about the predictions? Here are the steps of what I do (dataset name= bank_part):

pred<-cforest(y_n~.,bank_part) tablebank<-table(predict(pred),bank_part$y_n) prediction(tablebank, bank_part$y_n)

After running the last line I get this error:

Error in prediction(tablebank, bank_part$y_n) :  Number of cross-validation runs must be equal for predictions and labels.

Thanks in advance!

Here's another example: I have the training dataset(bank_training) and testing dataset(bank_testing) and I ran a randomForest as below:

bankrf<-randomForest(y~., bank_training, mtry=4, ntree=2,     keep.forest=TRUE,importance=TRUE)  bankrf.pred<-predict(bankrf, bank_testing, type='response')

Now the bankrf.pred is a factor object with labels c=("0", "1"). Still, I don't know how to plot ROC, cause I get stuck to the prediction part. Here's what I do

library(ROCR)  pred<-prediction(bankrf.pred$y, bank_testing$c(0,1)

But this is still incorrect, cause I get the error message

Error in bankrf.pred$y_n : $ operator is invalid for atomic vectors

541

asked Jul 13 '12 09:07

spektra

2 Answers

The predictions are your continuous predictions of the classification, the labels are the binary truth for each variable.

So something like the following should work:

> pred <- prediction(c(0.1,.5,.3,.8,.9,.4,.9,.5), c(0,0,0,1,1,1,1,1)) > perf <- performance(pred, "tpr", "fpr") > plot(perf)

to generate an ROC.

EDIT: It may be helpful for you to include the sample reproducible code in the question (I'm having a hard time intepreting your comment).

There's no new code here, but... here's a function I use quite often for plotting an ROC:

 plotROC <- function(truth, predicted, ...){    pred <- prediction(abs(predicted), truth)        perf <- performance(pred,"tpr","fpr")     plot(perf, ...) }

140

answered Sep 21 '22 03:09

Jeff Allen

Like @Jeff said, your predictions need to be continuous for ROCR's prediction function. require(randomForest); ?predict.randomForest shows that, by default, predict.randomForest returns a prediction on the original scale (class labels, in classification), whereas predict.randomForest(..., type = 'prob') returns probabilities of each class. So:

require(ROCR) data(iris) iris$setosa <- factor(1*(iris$Species == 'setosa')) iris.rf <- randomForest(setosa ~ ., data=iris[,-5]) summary(predict(iris.rf, iris[,-5])) summary(iris.preds <- predict(iris.rf, iris[,-5], type = 'prob')) preds <- iris.preds[,2] plot(performance(prediction(preds, iris$setosa), 'tpr', 'fpr'))

gives you what you want. Different classification packages require different commands for getting predicted probabilities -- sometimes it's predict(..., type='probs'), predict(..., type='prob')[,2], etc., so just check out the help files for each function you're calling.

answered Sep 20 '22 03:09

lockedoff

Related questions
                            
                                how to avoid saving empty records on a nested rails form
                            
                                Is there an easier way to tell HTTPBuilder to ignore an invalid cert?
                            
                                #1062 - Duplicate entry for key 'PRIMARY'
                            
                                remoteControlReceivedWithEvent not Called in appDelegate
                            
                                puts within method in rspec test
                            
                                How to add a validation error in MVC for a view model with multiple properties?
                            
                                Generating Hypermedia links in a Web API
                            
                                How to debug/handle intermittent "authorization denied" and "disk i/o" errors when adding SQL store to an NSPersistentStoreCoordinator?
                            
                                EPPlus with MemoryStream as email attachment -- file is empty
                            
                                Git avoiding "merge branch master"
                            
                                Replacing a sublist with another sublist in python
                            
                                How to delete all relationships in neo4j graph?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With