Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpreting logistic regression feature coefficient values in sklearn

I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about the phenomenon tested (binary response in post-experimental questionnaire), 3) time spent on the experimental task, and 4) participant age. I am trying to predict whether participants ultimately chose option A or option B. My logistic regression outputs the following feature coefficients with clf.coef_:

[[-0.68120795 -0.19073737 -2.50511774  0.14956844]]

If option A is my positive class, does this output mean that feature 3 is the most important feature for binary classification and has a negative relationship with participants choosing option A (note: I have not normalized/re-scaled my data)? I want to ensure that my understanding of the coefficients, and the information I can extract from them, is correct so I don't make any generalizations or false assumptions in my analysis.

Thanks for your help!

like image 880
Jane Sully Avatar asked Jun 24 '18 01:06

Jane Sully


2 Answers

You are getting to the right track there. If everything is a very similar magnitude, a larger pos/neg coefficient means larger effect, all things being equal.

However, if your data isn't normalized, Marat is correct in that the magnitude of the coefficients don't mean anything (without context). For instance you could get different coefficients by changing the units of measure to be larger or smaller.

I can't see if you've included a non-zero intercept here, but keep in mind that logistic regression coefficients are in fact odds ratios, and you need to transform them to probabilities to get something more directly interpretable.

Check out this page for a good explanation: https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-ratios-in-logistic-regression/

like image 155
rocksteady Avatar answered Sep 29 '22 10:09

rocksteady


Logistic regression returns information in log odds. So you must first convert log odds to odds using np.exp and then take odds/(1 + odds).

To convert to probabilities, use a list comprehension and do the following:

[np.exp(x)/(1 + np.exp(x)) for x in clf.coef_[0]]

This page had an explanation in R for converting log odds that I referenced: https://sebastiansauer.github.io/convert_logit2prob/

like image 27
Andreas Martinson Avatar answered Sep 29 '22 08:09

Andreas Martinson