Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logistic regression - defining reference level in R

I am going nuts trying to figure this out. How can I in R, define the reference level to use in a binary logistic regression? What about the multinomial logistic regression? Right now my code is:

logistic.train.model3 <- glm(class~ x+y+z,
                         family=binomial(link=logit), data=auth, na.action = na.exclude)

my response variable is "YES" and "NO". I want to predict the probability of someone responding with "YES".

I DO NOT want to recode the variable to 0 / 1. Is there a way I can tell the model to predict "YES" ?

Thank you for your help.

like image 563
blast00 Avatar asked Apr 25 '14 00:04

blast00


People also ask

How do you specify reference levels in R?

To specify the manual reference factor level in the R Language, we will use the relevel() function. The relevel() function is used to reorder the factor vector so that the level specified by the user is first and others are moved down.

What is the reference level in regression?

What are reference levels. The reference level of a categorical predictor variable is often considered the “baseline” or “usual” value that is observed for the given variable. In the process of dummy coding, the variable for the reference level is left out since it would simply contain “0” for every observation.

What is the reference category in logistic regression?

A “reference group” is a group that we choose to be the reference so that all odds ratios will be a comparison to the reference group. Age (in years) is linear so now we need to use logistic regression. Logistic regression allows us to look at all three predictors (sex, weight, and age) simultaneously.


2 Answers

Assuming you have class saved as a factor, use the relevel() function:

auth$class <- relevel(auth$class, ref = "YES")
like image 131
smrt1119 Avatar answered Oct 26 '22 23:10

smrt1119


Note that, when using auth$class <- relevel(auth$class, ref = "YES"), you are actually predicting "NO".

To predict "YES", the reference level must be "NO". Therefore, you have to use auth$class <- relevel(auth$class, ref = "NO").

It's a common mistake people do since most the time their oucome variable is a vector of 0 and 1, and people want to predict 1.

But when such a vector is considered as a factor variable, the reference level is 0 (see below) so that people effectively predict 1. Likewise, your reference level must be "NO" so that you will predict "YES".

set.seed(1234)
x1 <- sample(c(0, 1), 50, replace = TRUE)
x2 <- factor(x1)
str(x2)
#Factor w/ 2 levels "0","1": 1 2 2 2 2 2 1 1 2 2 ...You can see that reference level is 0
like image 24
nghauran Avatar answered Oct 27 '22 00:10

nghauran