Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using scikit to determine contributions of each feature to a specific class prediction

Tags:

I am using a scikit extra trees classifier:

model = ExtraTreesClassifier(n_estimators=10000, n_jobs=-1, random_state=0) 

Once the model is fitted and used to predict classes, I would like to find out the contributions of each feature to a specific class prediction. How do I do that in scikit learn? Is it possible with extra trees classifier or do I need to use some other model?

like image 677
user308827 Avatar asked Feb 07 '16 04:02

user308827


People also ask

How is feature importance calculated in Scikit-learn?

Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be calculated by the number of samples that reach the node, divided by the total number of samples. The higher the value the more important the feature.

What does predict () function of Sklearn do?

The Sklearn 'Predict' Method Predicts an OutputThat being the case, it provides a set of tools for doing things like training and evaluating machine learning models. And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).

How do you evaluate a feature important?

The concept is really straightforward: We measure the importance of a feature by calculating the increase in the model's prediction error after permuting the feature. A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction.

How do you check the feature important in random forest?

We can measure how each feature decrease the impurity of the split (the feature with highest decrease is selected for internal node). For each feature we can collect how on average it decreases the impurity. The average over all trees in the forest is the measure of the feature importance.


2 Answers

Update

Being more knowledgable about ML today than I was 2.5 years ago, I will now say this approach only works for highly linear decision problems. If you carelessly apply it to a non-linear problem you will have trouble.

Example: Imagine a feature for which neither very large nor very small values predict a class, but values in some intermediate interval do. That could be water intake to predict dehydration. But water intake probably interacts with salt intake, as eating more salt allows for a greater water intake. Now you have an interaction between two non-linear features. The decision boundary meanders around your feature-space to model this non-linearity and to ask only how much one of the features influences the risk of dehydration is simply ignorant. It is not the right question.

Alternative: Another, more meaningful, question you could ask is: If I didn't have this information (if I left out this feature) how much would my prediction of a given label suffer? To do this you simply leave out a feature, train a model and look at how much precision and recall drops for each of your classes. It still informs about feature importance, but it makes no assumptions about linearity.

Below is the old answer.


I worked through a similar problem a while back and posted the same question on Cross Validated. The short answer is that there is no implementation in sklearn that does all of what you want.

However, what you are trying to achieve is really quite simple, and can be done by multiplying the average standardised mean value of each feature split on each class, with the corresponding model._feature_importances array element. You can write a simple function that standardises your dataset, computes the mean of each feature split across class predictions, and does element-wise multiplication with the model._feature_importances array. The greater the absolute resulting values are, the more important the features will be to their predicted class, and better yet, the sign will tell you if it is small or large values that are important.

Here's a super simple implementation that takes a datamatrix X, a list of predictions Y and an array of feature importances, and outputs a JSON describing importance of each feature to each class.

def class_feature_importance(X, Y, feature_importances):     N, M = X.shape     X = scale(X)      out = {}     for c in set(Y):         out[c] = dict(             zip(range(N), np.mean(X[Y==c, :], axis=0)*feature_importances)         )      return out 

Example:

import numpy as np import json from sklearn.preprocessing import scale  X = np.array([[ 2,  2,  2,  0,  3, -1],               [ 2,  1,  2, -1,  2,  1],               [ 0, -3,  0,  1, -2,  0],               [-1, -1,  1,  1, -1, -1],               [-1,  0,  0,  2, -3,  1],               [ 2,  2,  2,  0,  3,  0]], dtype=float)  Y = np.array([0, 0, 1, 1, 1, 0]) feature_importances = np.array([0.1, 0.2, 0.3, 0.2, 0.1, 0.1]) #feature_importances = model._feature_importances  result = class_feature_importance(X, Y, feature_importances)  print json.dumps(result,indent=4)  {     "0": {         "0": 0.097014250014533204,          "1": 0.16932975630904751,          "2": 0.27854300726557774,          "3": -0.17407765595569782,          "4": 0.0961523947640823,          "5": 0.0     },      "1": {         "0": -0.097014250014533177,          "1": -0.16932975630904754,          "2": -0.27854300726557779,          "3": 0.17407765595569782,          "4": -0.0961523947640823,          "5": 0.0     } } 

The first level of keys in result are class labels, and the second level of keys are column-indices, i.e. feature-indices. Recall that large absolute values corresponds to importance, and the sign tells you whether it's small (possibly negative) or large values that matter.

like image 69
Ulf Aslak Avatar answered Oct 02 '22 16:10

Ulf Aslak


This is modified from the docs

from sklearn import datasets from sklearn.ensemble import ExtraTreesClassifier  iris = datasets.load_iris()  #sample data X, y = iris.data, iris.target  model = ExtraTreesClassifier(n_estimators=10000, n_jobs=-1, random_state=0) model.fit_transform(X,y) # fit the dataset to your model 

I think feature_importances_ is what you're looking for:

In [13]: model.feature_importances_ Out[13]: array([ 0.09523045,  0.05767901,  0.40150422,  0.44558631]) 

EDIT

Maybe I misunderstood the first time (pre-bounty), sorry, this may be more along the lines of what you are looking for. There is a python library called treeinterpreter that produces the information I think you are looking for. You'll have to use the basic DecisionTreeClassifer (or Regressor). Following along from this blog post, you can discretely access the feature contributions in the prediction of each instance:

from sklearn import datasets from sklearn.cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier  from treeinterpreter import treeinterpreter as ti  iris = datasets.load_iris()  #sample data X, y = iris.data, iris.target #split into training and test  X_train, X_test, y_train, y_test = train_test_split(      X, y, test_size=0.33, random_state=0)  # fit the model on the training set model = DecisionTreeClassifier(random_state=0) model.fit(X_train,y_train) 

I'll just iterate through each sample in X_test for illustrative purposes, this almost exactly mimics the blog post above:

for test_sample in range(len(X_test)):     prediction, bias, contributions = ti.predict(model, X_test[test_sample].reshape(1,4))     print "Class Prediction", prediction     print "Bias (trainset prior)", bias      # now extract contributions for each instance     for c, feature in zip(contributions[0], iris.feature_names):         print feature, c      print '\n' 

The first iteration of the loop yields:

Class Prediction [[ 0.  0.  1.]] Bias (trainset prior) [[ 0.34  0.31  0.35]] sepal length (cm) [ 0.  0.  0.] sepal width (cm) [ 0.  0.  0.] petal length (cm) [ 0.         -0.43939394  0.43939394] petal width (cm) [-0.34        0.12939394  0.21060606] 

Interpreting this output, it seems as though petal length and petal width were the most important contributors to the prediction of third class (for the first sample). Hope this helps.

like image 39
Kevin Avatar answered Oct 02 '22 14:10

Kevin