Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get the relative importance of features of a logistic regression for a particular prediction?

I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and having a measure of the importance of each feature for that prediction.

Using the coefficients (Betas) as a measure of importance is generally a bad idea as answered here, but I'm yet to find a good alternative.

So far the best I have found are the following 3 options:

  1. Monte Carlo Option: Fixing all other features, re-run the prediction replacing the feature we want to evaluate with random samples from the training set. Do this a large number of times. This would establish a baseline probability for the positive class. Then compare with the probability of the positive class of the original run. The difference is a measure of Importance of the feature.
  2. "Leave-one-out" classifiers: To evaluate the importance of a feature, first create a model which uses all features, and then another that uses all features except the one being tested. Predict the new observation using both models. The difference between the two would be the importance of the feature.
  3. Adjusted betas: Based on this answer, ranking the importance of the features by 'the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.'

All options (using betas, Monte Carlo and "Leave-one-out") seem like poor solutions to me.

  1. The Monte Carlo is dependent on the distribution of the training set, and I cannot find any literature to support it.
  2. The "leave one out" would be easily tricked by two correlated features (when one were absent, the other one would step in to compensate, and both would be given 0 importance).
  3. The adjusted betas sounds plausible, but I cannot find any literature to support it.

Actual question: What is the best way to interpret the importance of each feature, at the moment of a decision, with a linear classifier?

Quick note #1: for Random Forests this is trivial, we can simply use the prediction + bias decomposition, as explained beautifully in this blog post. The problem here is how to do something similar with linear classifiers such as Logistic Regression.

Quick note #2: there are a number of related questions on stackoverflow (1 2 3 4 5). I have not been able to find an answer to this specific question.

like image 688
sapo_cosmico Avatar asked Dec 30 '15 12:12

sapo_cosmico


People also ask

How do you identify the most important predictor variables in logistic regression models?

Standardized coefficients and the change in R-squared when a variable is added to the model last can both help identify the more important independent variables in a regression model—from a purely statistical standpoint.

How can you determine which features are the most important in your model?

You can get the feature importance of each feature of your dataset by using the feature importance property of the model. Feature importance gives you a score for each feature of your data, the higher the score more important or relevant is the feature towards your output variable.

How do you calculate feature importance in random forests for a particular attribute?

Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be calculated by the number of samples that reach the node, divided by the total number of samples. The higher the value the more important the feature.


1 Answers

If you want the importance of the features for a particular decision, why not simulate the decision_function (Which is provided by scikit-learn, so you can test whether you get the same value) step by step? The decision function for linear classifiers is simply:

intercept_ + coef_[0]*feature[0] + coef_[1]*feature[1] + ...

The importance of a feature i is then just coef_[i]*feature[i]. Of course this is similar to looking at the magnitude of the coefficients, but since it is multiplied with the actual feature and it is also what happens under the hood it might be your best bet.

like image 51
Robin Spiess Avatar answered Nov 10 '22 08:11

Robin Spiess