Interpreting logistic regression feature coefficient values in sklearn

Question

I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about the phenomenon tested (binary response in post-experimental questionnaire), 3) time spent on the experimental task, and 4) participant age. I am trying to predict whether participants ultimately chose option A or option B. My logistic regression outputs the following feature coefficients with clf.coef_:

[[-0.68120795 -0.19073737 -2.50511774  0.14956844]]

If option A is my positive class, does this output mean that feature 3 is the most important feature for binary classification and has a negative relationship with participants choosing option A (note: I have not normalized/re-scaled my data)? I want to ensure that my understanding of the coefficients, and the information I can extract from them, is correct so I don't make any generalizations or false assumptions in my analysis.

Thanks for your help!

rocksteady · Accepted Answer

You are getting to the right track there. If everything is a very similar magnitude, a larger pos/neg coefficient means larger effect, all things being equal.

However, if your data isn't normalized, Marat is correct in that the magnitude of the coefficients don't mean anything (without context). For instance you could get different coefficients by changing the units of measure to be larger or smaller.

I can't see if you've included a non-zero intercept here, but keep in mind that logistic regression coefficients are in fact odds ratios, and you need to transform them to probabilities to get something more directly interpretable.

Check out this page for a good explanation: https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-ratios-in-logistic-regression/

Andreas Martinson · Answer

Logistic regression returns information in log odds. So you must first convert log odds to odds using np.exp and then take odds/(1 + odds).

To convert to probabilities, use a list comprehension and do the following:

[np.exp(x)/(1 + np.exp(x)) for x in clf.coef_[0]]

This page had an explanation in R for converting log odds that I referenced: https://sebastiansauer.github.io/convert_logit2prob/

Interpreting logistic regression feature coefficient values in sklearn

Tags:

python

scikit-learn

logistic-regression

feature-selection

coefficients

Jane Sully

2 Answers

rocksteady

Andreas Martinson

Recent Activity

Donate For Us

Interpreting logistic regression feature coefficient values in sklearn

Tags:

python

scikit-learn

logistic-regression

feature-selection

coefficients

Jane Sully

2 Answers

rocksteady

Andreas Martinson

Related questions

Recent Activity

Donate For Us