I already know "xgboost.XGBRegressor
is a Scikit-Learn Wrapper interface for XGBoost."
But do they have any other difference?
xgb. train is an advanced interface for training an xgboost model. The xgboost function is a simpler wrapper for xgb.
XGBoost stands for "Extreme Gradient Boosting" and it is an implementation of gradient boosting trees algorithm. The XGBoost is a popular supervised machine learning model with characteristics like computation speed, parallelization, and performance.
@Baraban no, you can't. You can use squared loss for classification, you cannot use classifier for regression.
DMatrix is an internal data structure that is used by XGBoost, which is optimized for both memory efficiency and training speed. You can construct DMatrix from multiple different sources of data. Parameters. data (os. PathLike/string/numpy.
xgboost.train
is the low-level API to train the model via gradient boosting method.
xgboost.XGBRegressor
and xgboost.XGBClassifier
are the wrappers (Scikit-Learn-like wrappers, as they call it) that prepare the DMatrix
and pass in the corresponding objective function and parameters. In the end, the fit
call simply boils down to:
self._Booster = train(params, dmatrix, self.n_estimators, evals=evals, early_stopping_rounds=early_stopping_rounds, evals_result=evals_result, obj=obj, feval=feval, verbose_eval=verbose)
This means that everything that can be done with XGBRegressor
and XGBClassifier
is doable via underlying xgboost.train
function. The other way around it's obviously not true, for instance, some useful parameters of xgboost.train
are not supported in XGBModel
API. The list of notable differences includes:
xgboost.train
allows to set the callbacks
applied at end of each iteration.xgboost.train
allows training continuation via xgb_model
parameter.xgboost.train
allows not only minization of the eval function, but maximization as well.@Maxim, as of xgboost 0.90 (or much before), these differences don't exist anymore in that xgboost.XGBClassifier.fit:
callbacks
xgb_model
parameterWhat I find is different is evals_result
, in that it has to be retrieved separately after fit (clf.evals_result()
) and the resulting dict
is different because it can't take advantage of the name of the evals in the watchlist ( watchlist = [(d_train, 'train'), (d_valid, 'valid')]
) .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With