Can I show feature importance for MultiOutputClassifier?

Tags:

I'm trying to recover the feature importance of a multioutput Classifier using a RandomForest.

The MultiOutput model does not show any problems:

import numpy as np
import pandas as pd
import sklearn
from sklearn.datasets import make_multilabel_classification
from sklearn.datasets import make_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier

## Generate data
x, y = make_multilabel_classification(n_samples=1000, 
                                      n_features=15, 
                                      n_labels = 5, 
                                      n_classes=3, 
                                      random_state=12, 
                                      allow_unlabeled = True)
x_train = x[:700,:]
x_test  = x[701:,:]
y_train = y[:700,:]
y_test  = y[701:,:]

## Generate model
forest = RandomForestClassifier(n_estimators = 100, random_state = 1)
multi_forest = MultiOutputClassifier(forest, n_jobs = -1).fit(x_train, y_train)

## Make prediction
dfOutput_multi_forest = multi_forest.predict_proba(x_test)

The prediction dfOutput_multi_forest does not show any problems, but I want to recover the feature importance of the multi_forest for interpretation of the output.

Using multi_forest.feature_importance_ throws the following error message: AttributeError: 'MultiOutputClassifier' object has no attribute 'feature_importance_'

Does anyone know how to retrieve the feature importance? I'm using scikit v0.20.2

200

asked Feb 06 '19 20:02

PaulH

1 Answers

Indeed, it doesn't appear that Sklearn's MultiOutputClassifier has an attribute that contains some sort of amalgamation of the feature importances of all the estimators (in your case, all the RandomForest classifiers) used in the model.

However, it is possible to access the feature importances of each RandomForest classifier, and then average them all together to give you each feature's average importance, across all RandomForest classifiers.

MultiOutputClassifier objects have an attribute called estimators_. If you run multi_forest.estimators_, you will get a list containing an object for each of your RandomForest classifiers.

For each of these RandomForest classifier objects, you can access its feature importances through the feature_importances_ attribute.

To put it all together, this was my approach:

feat_impts = [] 
for clf in multi_forest.estimators_:
    feat_impts.append(clf.feature_importances_)

np.mean(feat_impts, axis=0)

I ran the example code you pasted into your question, and then ran the above block of code to output a list of the following 15 averages:

array([0.09830467, 0.0912088 , 0.05738045, 0.1211305 , 0.03901933,
       0.05429491, 0.06929378, 0.06404416, 0.05676634, 0.04919717,
       0.05244265, 0.0509295 , 0.05615341, 0.09202444, 0.04780991])

Which contains the average importance of each of your 15 features, across each of the 3 random forest classifiers used in your MultiOutputClassifier.

This should at least help you to see which features, on the whole, tended to be more important in making predictions for each of your 3 classes.

answered Oct 07 '22 11:10

James Dellinger

Related questions
                            
                                Call python script from .Net Core using pythonnet
                            
                                Django Tutorial: 'detail' is not a valid view function or pattern name
                            
                                Reshape vertical series to horizontal in Python
                            
                                Tying Autoencoder Weights in a Dense Keras Layer
                            
                                contains pyspark SQL: TypeError: 'Column' object is not callable
                            
                                Finding Similar Document
                            
                                Discord.py Rewrite gathering list of all commands
                            
                                Using default arguments in a function with variable arguments. Is this possible?
                            
                                'NoneType' object has no attribute 'text' in BeautifulSoup
                            
                                Issue clicking Javascript button with python/Selenium
                            
                                PytestWarning: Module already imported so cannot be rewritten: pytest_remotedata
                            
                                pd.DataFrame(data, columns=[]). How to pass a data which is with nested dictionary?
                            
                                conditional fill in pandas dataframe
                            
                                Logical AND of multiple columns in pandas
                            
                                How can I avoid PROJ_LIB error in importing basemap?
                            
                                Showing class attributes in the PyCharm debugger when subclassing str
                            
                                How do you round a string in Python?
                            
                                Wrapping asyncio.gather in a timeout
                            
                                How to add new fields in django user model [closed]
                            
                                Remove certain characters if on end of string in Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I show feature importance for MultiOutputClassifier?

Tags:

python

scikit-learn

random-forest

PaulH

People also ask

1 Answers

James Dellinger

Recent Activity

Donate For Us