Is there any way to get a list of features (attributes) from used model in Scikit-learn (or whole table of used training data)? I am using some preprocessing like feature selection and I would like to know features that were selected and features that were removed. For example I use Random Forest Classifier and Recursive Feature Elimination.
The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.
We mostly use attribute to refer to how model information is stored on an estimator during fitting. Any public attribute stored on an estimator instance is required to begin with an alphabetic character and end in a single underscore if it is set in fit or partial_fit.
The sklearn. feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image.
The maximum number of features to select. If an integer, then it specifies the maximum number of features to allow. If a callable, then it specifies how to calculate the maximum number of features allowed by using the output of max_feaures(X) . If None , then all features are kept.
A mask of selected features is stored in the '_support' attribute of the RFE object.
See the doc here: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE
Here is an example:
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.svm import SVR
# load a dataset
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = SVR(kernel="linear")
selector = RFE(estimator, 5, step=1)
X_new = selector.fit_transform(X, y)
print selector.support_
print selector.ranking_
Will display:
array([ True, True, True, True, True,
False, False, False, False, False], dtype=bool)
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
Note that if you want to use a random forest classifier in a RFE model, you'll get this error:
AttributeError: 'RandomForestClassifier' object has no attribute 'coef_'
I found a workarround in this thread: Recursive feature elimination on Random Forest using scikit-learn
You have to override the RandomForestClassifier class like this:
class RandomForestClassifierWithCoef(RandomForestClassifier):
def fit(self, *args, **kwargs):
super(RandomForestClassifierWithCoef, self).fit(*args, **kwargs)
self.coef_ = self.feature_importances_
Hope it helps :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With