glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

Question

I have a logistic regression model that I made using the glmnet package. My response variable was coded as a factor, the levels of which I will refer to as "a" and "b".

The mathematics of logistic regression label one of the two classes as "0" and the other as "1". The feature coefficients of a logistic regression model are either positive, negative, or zero. If a feature "f"'s coefficient is positive, then increasing the value of "f" for a test observation x increases the probability that the model classifies x as being of class "1".

My question is: Given a glmnet model, how do you know how glmnet mapped your data's factor labels {"a", "b"} to the underlying mathematics' factor labels {"0", "1"}? Because you need to know that to interpret the model's coefficients properly.

You can figure this out manually by experimenting with the output of the predict function when applied to toy observations. But it would be nice to how glmnet implicitly handles that mapping to speed up the interpretation process.

Thank you!

Zheyuan Li · Accepted Answer

Have a look at ?glmnet (page 9 of https://cran.r-project.org/web/packages/glmnet/glmnet.pdf):

y

response variable. ... For family="binomial" should be either a factor
with two levels, or a two-column matrix of counts or proportions (the 
second column is treated as the target class; for a factor, the last
level in alphabetical order is the target class) ...

Isn't it clear now? If you have "a" and "b" as your factor levels, "a" is coded as 0, while "b" is coded 1.

Such treatment is really standard. It is related to how R codes factor automatically, or how you code these factor levels yourself. Look at:

## automatic coding by R based on alphabetical order
set.seed(0); y1 <- factor(sample(letters[1:2], 10, replace = TRUE))
## manual coding
set.seed(0); y2 <- factor(sample(letters[1:2], 10, replace = TRUE),
                   levels = c("b", "a"))

# > y1
# [1] b a a b b a b b b b
# Levels: a b
# > y2
# [1] b a a b b a b b b b
# Levels: b a

# > levels(y1)
# [1] "a" "b"
# > levels(y2)
# [1] "b" "a"

Whether you use glmnet(), or simply glm(), the same thing happens.

glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

Tags:

r

logistic-regression

regression

glmnet

John Kleve

1 Answers

Zheyuan Li

Recent Activity

Donate For Us

glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

Tags:

r

logistic-regression

regression

glmnet

John Kleve

1 Answers

Zheyuan Li

Related questions

Recent Activity

Donate For Us