I am using python to fit an xgboost model incrementally (chunk by chunk). I came across a solution that uses xgboost.train but I do not know what to do with the Booster object that it returns. For instance, the XGBClassifier has options like fit, predict, predict_proba etc.
Here is what happens inside the for loop that I am reading in the data little by little:
dtrain=xgb.DMatrix(X_train, label=y)
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
modelXG=xgb.train(param,dtrain,xgb_model='xgbmodel')
modelXG.save_model("xgbmodel")
XGBClassifier
is a scikit-learn
compatible class which can be used in conjunction with other scikit-learn utilities.
Other than that, its just a wrapper over the xgb.train
, in which you dont need to supply advanced objects like Booster
etc.
Just send your data to fit()
, predict()
etc and internally it will be converted to appropriate objects automatically.
I'm not entirely sure what was your question. xgb.XGBMClassifier.fit()
under the hood calls xgb.train()
so it is a matter of matching us arguments of relevant functions.
If you are interested how to implement the learning that you have in mind, then you can do
clf = xgb.XGBClassifier(**params)
clf.fit(X, y, xgb_model=your_model)
See the documentation here. On each iteration you will have to save the booster using something like clf.get_booster().save_model(xxx)
.
PS I hope you do learning in mini-batches, i.e. chunks and not literally line-by-line, i.e. example-by-example, as that would result in performance drop due to writing/reading the model each time
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With