Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Youden's index on ROC curve and the corresponding cutoff value in a predictor variable seems strange in pROC package

Tags:

r

proc

I am trying to identify a threshold value in a variable (minimal clinically important difference, MCID) based on Youden's index on a ROC curve, using the pROC package in R. This should be fairly straightforward, but the threshold I get doesn't seem right.

As far as I understand based on documentation, the "threshold" coefficient is supposed to be value that I am looking for. Meaning that, if the threshold is 0.33, then a baseline value of 5 should change to at least 4.67 at follow-up to be considered a significant change.

Since pROC gives me several coefficients in the roc output, I wanted to make sure that I got it right, so I just tried multiplying my predictor by 10. I expected that I would then get 3.3 instead of 0.33, but this does not seem to be the case.

Below is a reproducible example based on the aSAH dataset available in pROC. I have used the ndka (plasma concentration) to predict the outcome.

I was hoping someone could tell me what I am doing wrong in the code, so that I can get the correct MCID value.

install.packages("pROC")
library(pROC)
#https://cran.r-project.org/web/packages/pROC/pROC.pdf

# Data
df <- aSAH # demo data
df$ndka10 <-  10 * df$ndka

# MCID by Youden's Index cutoff
model <- glm(outcome ~ ndka, data = df, family = binomial)
df$predicted_prob <- predict(model, type = "response")
roc_out <- roc(df$outcome, df$predicted_prob, direction = "<")
coords(roc_out, "best", ret="all") 

# MCID by Youden's Index cutoff, 10
model2 <- glm(outcome ~ ndka10, data = df, family = binomial)
df$predicted_prob2 <- predict(model2, type = "response")
roc_out2 <- roc(df$outcome, df$predicted_prob2, direction = "<")
coords(roc_out2, "best", ret="all") 

#ndka cutoff: ?
like image 252
jacerl980 Avatar asked Oct 24 '25 14:10

jacerl980


1 Answers

You are putting the predicted probability in the continuous value to determine the threshold. Therefore, 0.3348 is the threshold on the predicted probability scale.

In your case, your model estimates (at least for me when I run it) logORs of having poor outcome with an augmentation of 1 of ndka. The threshold means that when the predicted probability is greater than 0.3348, you consider it a poor outcome.

If you want to get the ndka value associated with that predicted probability, you need to invert the formula logit(p)=Intercept + Beta * ndka.

ndka = (logit(p) - Intercept) / Beta

as.numeric((qlogis(.3348) - coefficients(model)[1]) / coefficients(model)[2])
[1] 11.08061

For model 2, you multiplied ndka by 10, so you need to revert that state at the end also by dividing the result by 10 (you find ndka10 if you don't).

as.numeric((qlogis(.3348) - coefficients(model2)[1]) / coefficients(model2)[2]) / 10
[1] 11.08061

At that threshold, your model considers all ndka of 11.08 and above to be poor outcome. We can check for example sensitivity (0.71 given by coords(...)) and specificity (0.51 given by coords(...)).

# Sensitivity
mean(df$ndka[df$outcome == "Poor"] > 11.08)
[1] 0.7073171
# Specificity
1 - mean(df$ndka[df$outcome == "Good"] > 11.08)
[1] 0.5138889

Concerning your question about the multiplying by 10, it indeed does not change the result for the ROC curve. If you look into your model coefficients, you will see that the logOR for ndka10 in model 2 is 10 times less than the logOR for ndka in model, and Intercept does not change. Thus, if you take the equation Intercept + Beta * ndka, it becomes Intercept + Beta / 10 * ndka10 = Intercept + Beta / 10 * ndka * 10 = Intercept + Beta * ndka. The predicted probability is the same for both models.

I hope this will help you!

like image 50
Guillaume Mulier Avatar answered Oct 26 '25 04:10

Guillaume Mulier