xgboost's plotting API states:
xgboost.plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='F score', ylabel='Features', importance_type='weight', max_num_features=None, grid=True, **kwargs)¶
Plot importance based on fitted trees.
Parameters:
booster (Booster, XGBModel or dict) – Booster or XGBModel instance, or dict taken by Booster.get_fscore()
...
max_num_features (int, default None) – Maximum number of top features displayed on plot. If None, all features will be displayed.
In my implementation, however, running:
booster_ = XGBClassifier(learning_rate=0.1, max_depth=3, n_estimators=100,
silent=False, objective='binary:logistic', nthread=-1,
gamma=0, min_child_weight=1, max_delta_step=0, subsample=1,
colsample_bytree=1, colsample_bylevel=1, reg_alpha=0,
reg_lambda=1, scale_pos_weight=1, base_score=0.5, seed=0)
booster_.fit(X_train, y_train)
from xgboost import plot_importance
plot_importance(booster_, max_num_features=10)
Returns:
AttributeError: Unknown property max_num_features
While running it without the parameter max_num_features
plots correctly the entire feature set (which in my case is gigantic, ~10k features).
Any ideas of what's going on?
Thanks in advance.
Details:
> python -V
Python 2.7.12 :: Anaconda custom (x86_64)
> pip freeze | grep xgboost
xgboost==0.4a30
Try to upgrade your xgboost library to 0.6. It should solve the problem. To upgrade the package, try this:
$ pip install -U xgboost
If you get an error, try this:
$ brew install gcc@5
$ pip install -U xgboost
(Refer to this https://github.com/dmlc/xgboost/issues/1501)
Until further notice I've solved the problem (at least partially) with this script:
def feat_imp(df, model, n_features):
d = dict(zip(df.columns, model.feature_importances_))
ss = sorted(d, key=d.get, reverse=True)
top_names = ss[0:n_features]
plt.figure(figsize=(15,15))
plt.title("Feature importances")
plt.bar(range(n_features), [d[i] for i in top_names], color="r", align="center")
plt.xlim(-1, n_features)
plt.xticks(range(n_features), top_names, rotation='vertical')
feat_imp(filled_train_full, booster_, 20)
Despite the title of the documentation webpage ("Python API Reference - xgboost 0.6 documentation"), it does not contain the documentation for the 0.6 release of xgboost
. Instead it seems to contain the documentation for the latest git master branch.
The 0.6 release of xgboost
was made on Jul 29 2016:
This is a stable release of 0.6 version
@tqchen tqchen released this on Jul 29 2016 · 245 commits to master since this release
The commit that added plot_importance()
's max_num_features
was made on Jan 16 2017:
As a further check, let's inspect the 0.60 release tarball:
pushd /tmp
curl -SLO https://github.com/dmlc/xgboost/archive/v0.60.tar.gz
tar -xf v0.60.tar.gz
grep num_features xgboost-0.60/python-package/xgboost/plotting.py
# .. silence.
Therefore this seems to be a documentation bug with the xgboost project.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With