Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get attribute list from fitted model in Scikit-learn?

Is there any way to get a list of features (attributes) from used model in Scikit-learn (or whole table of used training data)? I am using some preprocessing like feature selection and I would like to know features that were selected and features that were removed. For example I use Random Forest Classifier and Recursive Feature Elimination.

like image 699
Bohemiak Avatar asked Aug 27 '15 13:08

Bohemiak


People also ask

What does the Fit () method in scikit-learn do?

The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.

How is model information stored in an estimator during fitting?

We mostly use attribute to refer to how model information is stored on an estimator during fitting. Any public attribute stored on an estimator instance is required to begin with an alphabetic character and end in a single underscore if it is set in fit or partial_fit.

What is sklearn Feature_extraction?

The sklearn. feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image.

How does select from model work?

The maximum number of features to select. If an integer, then it specifies the maximum number of features to allow. If a callable, then it specifies how to calculate the maximum number of features allowed by using the output of max_feaures(X) . If None , then all features are kept.


1 Answers

A mask of selected features is stored in the '_support' attribute of the RFE object.

See the doc here: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE

Here is an example:

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.svm import SVR

# load a dataset
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

estimator = SVR(kernel="linear")
selector = RFE(estimator, 5, step=1)
X_new = selector.fit_transform(X, y)

print selector.support_ 
print selector.ranking_

Will display:

array([ True,  True,  True,  True,  True,
      False, False, False, False, False], dtype=bool)
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5]) 

Note that if you want to use a random forest classifier in a RFE model, you'll get this error:

AttributeError: 'RandomForestClassifier' object has no attribute 'coef_'

I found a workarround in this thread: Recursive feature elimination on Random Forest using scikit-learn

You have to override the RandomForestClassifier class like this:

class RandomForestClassifierWithCoef(RandomForestClassifier):
    def fit(self, *args, **kwargs):
        super(RandomForestClassifierWithCoef, self).fit(*args, **kwargs)
        self.coef_ = self.feature_importances_

Hope it helps :)

like image 124
Paul Edouard Avatar answered Oct 26 '22 06:10

Paul Edouard