I am performing logistic regression using this page. My code is as below.
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") mylogit <- glm(admit ~ gre, data = mydata, family = "binomial") summary(mylogit) prob=predict(mylogit,type=c("response")) mydata$prob=prob
After running this code mydata dataframe has two columns - 'admit' and 'prob'. Shouldn't those two columns sufficient to get the ROC curve?
How can I get the ROC curve.
Secondly, by loooking at mydata, it seems that model is predicting probablity of admit=1
.
Is that correct?
How to find out which particular event the model is predicting?
Thanks
UPDATE: It seems that below three commands are very useful. They provide the cut-off which will have maximum accuracy and then help to get the ROC curve.
coords(g, "best") mydata$prediction=ifelse(prob>=0.3126844,1,0) confusionMatrix(mydata$prediction,mydata$admit
The Area Under the ROC curve (AUC) is an aggregated metric that evaluates how well a logistic regression model classifies positive and negative outcomes at all possible cutoffs. It can range from 0.5 to 1, and the larger it is the better.
The roc() function takes the actual and predicted value as an argument and returns a ROC curve object as result. Then, to find the AUC (Area under Curve) of that curve, we use the auc() function. The auc() function takes the roc object as an argument and returns the area under the curve of that roc curve.
The ROC curve compares the rank of prediction and answer. Therefore, you could evaluate the ROC curve with package pROC
as follow:
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") mylogit <- glm(admit ~ gre, data = mydata, family = "binomial") summary(mylogit) prob=predict(mylogit,type=c("response")) mydata$prob=prob library(pROC) g <- roc(admit ~ prob, data = mydata) plot(g)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With