Lets say I build an xgboost model:
bst = xgb.train(param0, dtrain1, num_round, evals=[(dtrain, "training")])
Where:
Then, I save the model to disk:
bst.save_model("xgbmodel")
Later on, I want to reload the model I saved and continue training it with dtrain2
Does anyone have an idea how to do it?
You don't even have to load the model from the disk and retrain.
All you need to do is the same xgb.train command with additional parameter: xgb_model= (either xgboost model full path name you've saved like in the question or a Booster object).
Example:
bst = xgb.train(param0, dtrain2, num_round, evals=[(dtrain, "training")], xgb_model='xgbmodel')
Good luck!
For users who are looking to continue training with XGBClassifier or object obtained from .fit function of sklearn.
from xgboost import XGBClassifier
# best_est = best number of tree
# best_lr = best learning days
# best_subsample = best subsample bw 0 and 1
params = {'objective': 'binary:logistic', 'use_label_encoder': False,
'seed': 27, 'eval_metric': 'logloss', 'n_estimators': best_est,
'learning_rate': best_lr, 'subsample': best_subsample}
# train iteration 1 below
model = XGBClassifier(**params)
model.fit(x_train_1, y_train_1)
# train iteration 2 below
model = model.fit(x_train_2, y_train_2, xgb_model=model.get_booster())
In the above code
x_train_*, y_train_* are the object of pandas DataFrame type.
The main concept to learn here is, xgb core functions while retraining always takes the booster as input. So one can either provide the booster from model object or the saved model path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With