Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between original xgboost (Learning API) and sklearn XGBClassifier (Scikit-Learn API)

I use the xgboots sklearn interface below to create and train an xgb model-1.

clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',)
clf.fit(x_train, y_train,  early_stopping_rounds=10, eval_metric="auc", 
    eval_set=[(x_valid, y_valid)])

And the xgboost model can be created by original xgboost as model-2 below:

param = {}
param['objective'] = 'binary:logistic'
param['eval_metric'] = "auc"
num_rounds = 100
xgtrain = xgb.DMatrix(x_train, label=y_train)
xgval = xgb.DMatrix(x_valid, label=y_valid)
watchlist = [(xgtrain, 'train'),(xgval, 'val')]
model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=10)

I think all the parameters are the same between model-1 and model-2. But the validation score is different. Is any difference between model-1 and model-2 ?

like image 371
ybdesire Avatar asked Jun 21 '16 11:06

ybdesire


People also ask

Is XGBoost part of Scikit-learn?

XGBoost is easy to implement in scikit-learn. XGBoost is an ensemble, so it scores better than individual models.

What is Xgb train?

xgb. train is an advanced interface for training an xgboost model. The xgboost function is a simpler wrapper for xgb.

What does Xgb DMatrix do?

DMatrix is an internal data structure that is used by XGBoost, which is optimized for both memory efficiency and training speed. You can construct DMatrix from multiple different sources of data.


1 Answers

As I understand, there are many differences between default parameters in xgb and in its sklearn interface. For example: default xgb has eta=0.3 while the other has eta=0.1. You can see more about default parameters of each implements here:

https://github.com/dmlc/xgboost/blob/master/doc/parameter.md http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn

like image 60
Du Phan Avatar answered Oct 19 '22 11:10

Du Phan