I am going nuts trying to figure this out. How can I in R, define the reference level to use in a binary logistic regression? What about the multinomial logistic regression? Right now my code is:
logistic.train.model3 <- glm(class~ x+y+z,
family=binomial(link=logit), data=auth, na.action = na.exclude)
my response variable is "YES" and "NO". I want to predict the probability of someone responding with "YES".
I DO NOT want to recode the variable to 0 / 1. Is there a way I can tell the model to predict "YES" ?
Thank you for your help.
To specify the manual reference factor level in the R Language, we will use the relevel() function. The relevel() function is used to reorder the factor vector so that the level specified by the user is first and others are moved down.
What are reference levels. The reference level of a categorical predictor variable is often considered the “baseline” or “usual” value that is observed for the given variable. In the process of dummy coding, the variable for the reference level is left out since it would simply contain “0” for every observation.
A “reference group” is a group that we choose to be the reference so that all odds ratios will be a comparison to the reference group. Age (in years) is linear so now we need to use logistic regression. Logistic regression allows us to look at all three predictors (sex, weight, and age) simultaneously.
Assuming you have class saved as a factor, use the relevel()
function:
auth$class <- relevel(auth$class, ref = "YES")
Note that, when using auth$class <- relevel(auth$class, ref = "YES")
, you are actually predicting "NO".
To predict "YES", the reference level must be "NO". Therefore, you have to use auth$class <- relevel(auth$class, ref = "NO")
.
It's a common mistake people do since most the time their oucome variable is a vector of 0
and 1
, and people want to predict 1
.
But when such a vector is considered as a factor variable, the reference level is 0
(see below) so that people effectively predict 1
. Likewise, your reference level must be "NO" so that you will predict "YES".
set.seed(1234)
x1 <- sample(c(0, 1), 50, replace = TRUE)
x2 <- factor(x1)
str(x2)
#Factor w/ 2 levels "0","1": 1 2 2 2 2 2 1 1 2 2 ...You can see that reference level is 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With