Retrieve list of training features names from classifier

Question

Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen data. The data used for training is a pandas DataFrame and in my case, the classifier is a RandomForestClassifier.

Keith · Accepted Answer

I have a solution which works but is not very elegant. This is an old post with no existing solutions so I suppose there are not any.

Create and fit your model. For example

model = GradientBoostingRegressor(**params)
model.fit(X_train, y_train)

Then you can add an attribute which is the 'feature_names' since you know them at training time

model.feature_names = list(X_train.columns.values)

I typically then put the model into a binary file to pass it around but you can ignore this

joblib.dump(model, filename)
loaded_model = joblib.load(filename)

Then you can get the feature names back from the model to use them when you predict

f_names = loaded_model.feature_names
loaded_model.predict(X_pred[f_names])

Adam Jermann · Answer

Based on the documentation and previous experience, there is no way to get a list of the features considered at least at one of the splitting.

Is your concern that you do not want to use all your features for prediction, just the ones actually used for training? In this case I suggest to list the feature_importances_ after fitting and eliminate the features that does not seem relevant. Then train a new model with only the relevant features and use those features for prediction as well.

Retrieve list of training features names from classifier

Tags:

python

pandas

scikit-learn

random-forest

user6903745

2 Answers

Keith

Adam Jermann

Recent Activity

Donate For Us

Retrieve list of training features names from classifier

Tags:

python

pandas

scikit-learn

random-forest

user6903745

2 Answers

Keith

Adam Jermann

Related questions

Recent Activity

Donate For Us