Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGBoost plot importance has no property max_num_features

xgboost's plotting API states:

xgboost.plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='F score', ylabel='Features', importance_type='weight', max_num_features=None, grid=True, **kwargs)¶

Plot importance based on fitted trees.

Parameters:

booster (Booster, XGBModel or dict) – Booster or XGBModel instance, or dict taken by Booster.get_fscore()
...
max_num_features (int, default None) – Maximum number of top features displayed on plot. If None, all features will be displayed.

In my implementation, however, running:

booster_ = XGBClassifier(learning_rate=0.1, max_depth=3, n_estimators=100, 
                      silent=False, objective='binary:logistic', nthread=-1, 
                      gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, 
                      colsample_bytree=1, colsample_bylevel=1, reg_alpha=0,
                      reg_lambda=1, scale_pos_weight=1, base_score=0.5, seed=0)

booster_.fit(X_train, y_train)

from xgboost import plot_importance
plot_importance(booster_, max_num_features=10)

Returns:

AttributeError: Unknown property max_num_features

While running it without the parameter max_num_features plots correctly the entire feature set (which in my case is gigantic, ~10k features). Any ideas of what's going on?

Thanks in advance.

Details:

> python -V
  Python 2.7.12 :: Anaconda custom (x86_64)

> pip freeze | grep xgboost
  xgboost==0.4a30
like image 386
Carlo Mazzaferro Avatar asked Feb 26 '17 00:02

Carlo Mazzaferro


3 Answers

Try to upgrade your xgboost library to 0.6. It should solve the problem. To upgrade the package, try this:

$ pip install -U xgboost

If you get an error, try this:

$ brew install gcc@5
$ pip install -U xgboost

(Refer to this https://github.com/dmlc/xgboost/issues/1501)

like image 107
Tamirlan Avatar answered Nov 15 '22 01:11

Tamirlan


Until further notice I've solved the problem (at least partially) with this script:

def feat_imp(df, model, n_features):

    d = dict(zip(df.columns, model.feature_importances_))
    ss = sorted(d, key=d.get, reverse=True)
    top_names = ss[0:n_features]

    plt.figure(figsize=(15,15))
    plt.title("Feature importances")
    plt.bar(range(n_features), [d[i] for i in top_names], color="r", align="center")
    plt.xlim(-1, n_features)
    plt.xticks(range(n_features), top_names, rotation='vertical')

 feat_imp(filled_train_full, booster_, 20)

enter image description here

like image 42
Carlo Mazzaferro Avatar answered Nov 15 '22 00:11

Carlo Mazzaferro


Despite the title of the documentation webpage ("Python API Reference - xgboost 0.6 documentation"), it does not contain the documentation for the 0.6 release of xgboost. Instead it seems to contain the documentation for the latest git master branch.

The 0.6 release of xgboost was made on Jul 29 2016:

This is a stable release of 0.6 version

@tqchen tqchen released this on Jul 29 2016 · 245 commits to master since this release

The commit that added plot_importance()'s max_num_featureswas made on Jan 16 2017:

As a further check, let's inspect the 0.60 release tarball:

pushd /tmp
curl -SLO https://github.com/dmlc/xgboost/archive/v0.60.tar.gz
tar -xf v0.60.tar.gz 
grep num_features xgboost-0.60/python-package/xgboost/plotting.py
# .. silence.

Therefore this seems to be a documentation bug with the xgboost project.

like image 2
Ray Donnelly Avatar answered Nov 15 '22 00:11

Ray Donnelly