I am trying to identify a threshold value in a variable (minimal clinically important difference, MCID) based on Youden's index on a ROC curve, using the pROC package in R. This should be fairly straightforward, but the threshold I get doesn't seem right.
As far as I understand based on documentation, the "threshold" coefficient is supposed to be value that I am looking for. Meaning that, if the threshold is 0.33, then a baseline value of 5 should change to at least 4.67 at follow-up to be considered a significant change.
Since pROC gives me several coefficients in the roc output, I wanted to make sure that I got it right, so I just tried multiplying my predictor by 10. I expected that I would then get 3.3 instead of 0.33, but this does not seem to be the case.
Below is a reproducible example based on the aSAH dataset available in pROC. I have used the ndka (plasma concentration) to predict the outcome.
I was hoping someone could tell me what I am doing wrong in the code, so that I can get the correct MCID value.
install.packages("pROC")
library(pROC)
#https://cran.r-project.org/web/packages/pROC/pROC.pdf
# Data
df <- aSAH # demo data
df$ndka10 <- 10 * df$ndka
# MCID by Youden's Index cutoff
model <- glm(outcome ~ ndka, data = df, family = binomial)
df$predicted_prob <- predict(model, type = "response")
roc_out <- roc(df$outcome, df$predicted_prob, direction = "<")
coords(roc_out, "best", ret="all")
# MCID by Youden's Index cutoff, 10
model2 <- glm(outcome ~ ndka10, data = df, family = binomial)
df$predicted_prob2 <- predict(model2, type = "response")
roc_out2 <- roc(df$outcome, df$predicted_prob2, direction = "<")
coords(roc_out2, "best", ret="all")
#ndka cutoff: ?
You are putting the predicted probability in the continuous value to determine the threshold. Therefore, 0.3348 is the threshold on the predicted probability scale.
In your case, your model estimates (at least for me when I run it) logORs of having poor outcome with an augmentation of 1 of ndka. The threshold means that when the predicted probability is greater than 0.3348, you consider it a poor outcome.
If you want to get the ndka value associated with that predicted probability, you need to invert the formula logit(p)=Intercept + Beta * ndka.
ndka = (logit(p) - Intercept) / Beta
as.numeric((qlogis(.3348) - coefficients(model)[1]) / coefficients(model)[2])
[1] 11.08061
For model 2, you multiplied ndka by 10, so you need to revert that state at the end also by dividing the result by 10 (you find ndka10 if you don't).
as.numeric((qlogis(.3348) - coefficients(model2)[1]) / coefficients(model2)[2]) / 10
[1] 11.08061
At that threshold, your model considers all ndka of 11.08 and above to be poor outcome. We can check for example sensitivity (0.71 given by coords(...)) and specificity (0.51 given by coords(...)).
# Sensitivity
mean(df$ndka[df$outcome == "Poor"] > 11.08)
[1] 0.7073171
# Specificity
1 - mean(df$ndka[df$outcome == "Good"] > 11.08)
[1] 0.5138889
Concerning your question about the multiplying by 10, it indeed does not change the result for the ROC curve. If you look into your model coefficients, you will see that the logOR for ndka10 in model 2 is 10 times less than the logOR for ndka in model, and Intercept does not change. Thus, if you take the equation Intercept + Beta * ndka, it becomes Intercept + Beta / 10 * ndka10 = Intercept + Beta / 10 * ndka * 10 = Intercept + Beta * ndka. The predicted probability is the same for both models.
I hope this will help you!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With