Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGBoost regression - Predicted values out of training bounds

A dataset containing various features and a regression target (called qval) was used to train an XGBoost regressor. This value, qval, is between 0 and 1 and should have the following distribution: enter image description here

So far, so good. However, when I save the model with xgb.save_model() and re-load it with xgb.load_model() to predict on another dataset this qval, the predicted qval is out of the [0,1] boundary, as shown here.

enter image description here

Could someone explain if this is normal, and if yes, why this is happening? From my perspective, it might be just that the "equation" (very bad word here) that computes the qval was trained on some data and the weights don't really take into account the [0,1] boundary. Therefore, when applying those"weights" to the new data the result is out of bounds. Not entirely sure though.

like image 763
Vanhaeren Thomas Avatar asked Nov 20 '25 06:11

Vanhaeren Thomas


1 Answers

Yes, xgboost can make predictions outside the training labels range.

from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingRegressor

X, y = make_classification(random_state=42)

gbm = GradientBoostingRegressor(max_depth=1,
                                n_estimators=10,
                                learning_rate=1,
                                random_state=42)
gbm.fit(X,y)
preds = gbm.predict(X)
print(preds.min(), preds.max())
# Output
#-0.010418732339562916 1.134566081403055

This probably means that your test set is different than your training set.

For random forest and decision trees this does not happen.

This phenomenon is related to the boosting ensembling and how it works.

like image 108
Carlos Mougan Avatar answered Nov 22 '25 20:11

Carlos Mougan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!