Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xgboost.train versus XGBClassifier

I am using python to fit an xgboost model incrementally (chunk by chunk). I came across a solution that uses xgboost.train but I do not know what to do with the Booster object that it returns. For instance, the XGBClassifier has options like fit, predict, predict_proba etc.

Here is what happens inside the for loop that I am reading in the data little by little:

dtrain=xgb.DMatrix(X_train, label=y)
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
modelXG=xgb.train(param,dtrain,xgb_model='xgbmodel')
modelXG.save_model("xgbmodel")
like image 204
Max Avatar asked May 03 '18 20:05

Max


2 Answers

XGBClassifier is a scikit-learn compatible class which can be used in conjunction with other scikit-learn utilities.

Other than that, its just a wrapper over the xgb.train, in which you dont need to supply advanced objects like Booster etc.

Just send your data to fit(), predict() etc and internally it will be converted to appropriate objects automatically.

like image 95
Vivek Kumar Avatar answered Nov 12 '22 14:11

Vivek Kumar


I'm not entirely sure what was your question. xgb.XGBMClassifier.fit() under the hood calls xgb.train() so it is a matter of matching us arguments of relevant functions.

If you are interested how to implement the learning that you have in mind, then you can do

clf = xgb.XGBClassifier(**params)
clf.fit(X, y, xgb_model=your_model)

See the documentation here. On each iteration you will have to save the booster using something like clf.get_booster().save_model(xxx).

PS I hope you do learning in mini-batches, i.e. chunks and not literally line-by-line, i.e. example-by-example, as that would result in performance drop due to writing/reading the model each time

like image 3
Mischa Lisovyi Avatar answered Nov 12 '22 13:11

Mischa Lisovyi