I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and having a measure of the importance of each feature for that prediction.
Using the coefficients (Betas) as a measure of importance is generally a bad idea as answered here, but I'm yet to find a good alternative.
So far the best I have found are the following 3 options:
All options (using betas, Monte Carlo and "Leave-one-out") seem like poor solutions to me.
Actual question: What is the best way to interpret the importance of each feature, at the moment of a decision, with a linear classifier?
Quick note #1: for Random Forests this is trivial, we can simply use the prediction + bias
decomposition, as explained beautifully in this blog post. The problem here is how to do something similar with linear classifiers such as Logistic Regression.
Quick note #2: there are a number of related questions on stackoverflow (1 2 3 4 5). I have not been able to find an answer to this specific question.
Standardized coefficients and the change in R-squared when a variable is added to the model last can both help identify the more important independent variables in a regression model—from a purely statistical standpoint.
You can get the feature importance of each feature of your dataset by using the feature importance property of the model. Feature importance gives you a score for each feature of your data, the higher the score more important or relevant is the feature towards your output variable.
Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be calculated by the number of samples that reach the node, divided by the total number of samples. The higher the value the more important the feature.
If you want the importance of the features for a particular decision, why not simulate the decision_function
(Which is provided by scikit-learn, so you can test whether you get the same value) step by step? The decision function for linear classifiers is simply:
intercept_ + coef_[0]*feature[0] + coef_[1]*feature[1] + ...
The importance of a feature i is then just coef_[i]*feature[i]
. Of course this is similar to looking at the magnitude of the coefficients, but since it is multiplied with the actual feature and it is also what happens under the hood it might be your best bet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With