Before building a model I make scaling like this
X = StandardScaler(with_mean = 0, with_std = 1).fit_transform(X)
and after build a features importance plot
xgb.plot_importance(bst, color='red')
plt.title('importance', fontsize = 20)
plt.yticks(fontsize = 10)
plt.ylabel('features', fontsize = 20)
The problem is that instead of feature's names we get f0, f1, f2, f3 etc..... How to return feature's names?
thanks
The feature importance (variable importance) describes which features are relevant. It can help with better understanding of the solved problem and sometimes lead to model improvements by employing the feature selection.
Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent the “importance” of each feature. A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain variable.
Feature importance is defined only for tree boosters. Feature importance is only defined when the decision tree model is chosen as base learner (booster=gbtree). It is not defined for other base learner types, such as linear learners (booster=gblinear).
The XGBoost library provides a built-in function to plot features ordered by their importance. features are automatically named according to their index in feature importance graph.
first we get list of feature names before preprocessing
dtrain = xgb.DMatrix( X, label=y)
dtrain.feature_names
Then
bst.get_fscore()
mapper = {'f{0}'.format(i): v for i, v in enumerate(dtrain.feature_names)}
mapped = {mapper[k]: v for k, v in bst.get_fscore().items()}
mapped
xgb.plot_importance(mapped, color='red')
that's all
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With