Determine WHY Features Are Important in Decision Tree Models

Tags:

Often-times stakeholders don't want a black-box model that's good at predicting; they want insights about features to have a better understanding about their business, and so they can explain it to others.

When we inspect the feature importance of an xgboost or sklearn gradient boosting model, we can determine the feature importance... but we don't understand WHY the features are important, do we?

Is there a way to explain not only what features are important but also WHY they're important?

I was told to use shap but running even some of the boilerplate examples throws errors so I'm looking for alternatives (or even just a procedural way to inspect trees and glean insights I can take away other than a plot_importance() plot).

In the example below, how does one go about explaining WHY feature f19 is the most important (while also realizing that decision trees are random without a random_state or seed).

from xgboost import XGBClassifier, plot_importance
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
X,y = make_classification(random_state=68)
xgb = XGBClassifier()
xgb.fit(X, y)
plot_importance(xgb)
plt.show()

feature_importance

Update: What I'm looking for is a programmatic procedural proof that the features chosen by the model above contribute either positively or negatively to the predictive power. I want to see code (not theory) of how you would go about inspecting the actual model and determining each feature's positive or negative contribution. Currently, I maintain that it's not possible so somebody please prove me wrong. I'd love to be wrong!

I also understand that decision trees are non-parametric and have no coefficients. Still, is there a way to see if a feature contributes positively (one unit of this feature increases y) or negatively (one unit of this feature decreases y).

Update2: Despite a thumbs down on this question, and several "close" votes, it seems this question isn't so crazy after all. Partial dependence plots might be the answer.

Partial Dependence Plots (PDP) were introduced by Friedman (2001) with purpose of interpreting complex Machine Learning algorithms. Interpreting a linear regression model is not as complicated as interpreting Support Vector Machine, Random Forest or Gradient Boosting Machine models, this is were Partial Dependence Plot can come into use. For some statistical explaination you can refer hereand More Advance. Some of the algorithms have methods for finding variable importance but they do not express whether a varaible is positively or negatively affecting the model .

433

asked Nov 04 '17 01:11

Jarad

1 Answers

tldr; http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html

I'd like to clear up some of the wording to make sure we're on the same page.

Predictive power: what features significantly contribute to the prediction
Feature dependence: are the features positively or negatively correlated, i.e., does a change in the feature X cause the prediction y to increase/decrease

1. Predictive power

Your feature importance shows you what retains the most information, and are the most significant features. Power could imply what causes the biggest change - you would have to check by plugging in dummy values to see their overall impact, much like you would have to do with linear regression coefficients.

2. Correlation/Dependence

As pointed out by @Tiago1984, it depends heavily on the underlying algorithm. XGBoost/GBM are additively building a committee of stubs (decision trees with a low number of trees, usually only one split).

In a regression problem, the trees are typically using a criterion related to the MSE. I won't go into the full details, but you can read more here: https://medium.com/towards-data-science/boosting-algorithm-gbm-97737c63daa3.

You'll see that at each step it calculates a vector for the "direction" of the weak learner, so you in principle know the direction of the influence from it (but keep in mind it may appear many times in one tree, in multiple steps of the additive model).

But, to cut to the chase; you could just fix all your features apart from f19 and make a prediction for a range of f19 values and see how it is related to the response value.

Take a look at partial dependency plots: http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html

There's also a chapter on it in Elements of Statistical Learning, Chapter 10.13.2.

123

answered Nov 14 '22 22:11

jonnybazookatone

Related questions
                            
                                resolving YAML files and substituting into templates
                            
                                Geany autocomplete Python constraints
                            
                                running python script as a systemd service
                            
                                Why are Conda Virtual Environments so big?
                            
                                How to replace a value within a tensor by indices?
                            
                                How to install dbus-python on macOS?
                            
                                Practical Use of Reversed Set Operators in Python
                            
                                Split queue into train/test set
                            
                                How Yolo calculate P(Object) in the YOLO 9000
                            
                                Attaching a pre-built query to a scoped_session in SQLAlchemy
                            
                                Missing application resource while running script in pyspark
                            
                                Why close a cursor for Sqlite3 in Python
                            
                                Apply sklearn trained model on a dataframe with PySpark
                            
                                Connecting python to cassandra a cluster from windows with DseAuthenticator and DseAuthorizer
                            
                                pass fixture to test class in pytest
                            
                                Count subtests in Python unittests separately
                            
                                reverse word embeddings in keras - python
                            
                                How to provide learning rate value to tensorboard in keras
                            
                                Pandas - Fast way of accessing a column of objects' attribute
                            
                                MonitoredTrainingSession writes more than one metagraph event per run

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Determine WHY Features Are Important in Decision Tree Models

Tags:

python

machine-learning

scikit-learn

decision-tree

xgboost

Jarad

People also ask

1 Answers

jonnybazookatone

Recent Activity

Donate For Us