i'm trying to use MAPE
as eval metric in xgboost, but get strange results:
def xgb_mape(preds, dtrain):
labels = dtrain.get_label()
return('mape', np.mean(np.abs((labels - preds) / (labels+1))))
xgp = {"colsample_bytree": 0.9,
"min_child_weight": 24,
"subsample": 0.9,
"eta": 0.05,
"objective": "reg:linear",
"seed": 70}
cv = xgb.cv(params = xgp,
dtrain = xgb.DMatrix(train_set[cols_to_use], label=train_set.y),
folds = KFold(n = len(train_set), n_folds=4, random_state = 707, shuffle=True),
feval = xgb_mape,
early_stopping_rounds=10,
num_boost_round=1000,
verbose_eval=10,
maximize=False
)
It returns:
[0] train-mape:0.780683+0.00241932 test-mape:0.779896+0.0024619
[10] train-mape:0.84939+0.0196102 test-mape:0.858054+0.0184669
[20] train-mape:1.0778+0.0313676 test-mape:1.10751+0.0293785
[30] train-mape:1.26066+0.0343771 test-mape:1.30707+0.0323237
[40] train-mape:1.37713+0.0347438 test-mape:1.43339+0.030565
[50] train-mape:1.45653+0.042433 test-mape:1.52176+0.0383677
[60] train-mape:1.52268+0.0386395 test-mape:1.5909+0.0353497
[70] train-mape:1.5636+0.0383622 test-mape:1.63482+0.0301809
[80] train-mape:1.59408+0.0378158 test-mape:1.66748+0.0315529
[90] train-mape:1.61712+0.0403532 test-mape:1.69134+0.0325177
[100] train-mape:1.63028+0.0389446 test-mape:1.70578+0.0316045
[110] train-mape:1.63556+0.0375842 test-mape:1.71153+0.031564
[120] train-mape:1.63509+0.0393198 test-mape:1.7117+0.0320471
Train and test results increases with maximize=False
, also early_stopping doesnt work properly. Where is error?
UPD. added -1*
to xgb_mape
, it solved problem. Looks like maximize
parameter doesn't work properly for custom eval functions.
eval_metric [default according to objective] Evaluation metrics for validation data, a default metric will be assigned according to objective (rmse for regression, and logloss for classification, mean average precision for ranking)
DMatrix is an internal data structure that is used by XGBoost, which is optimized for both memory efficiency and training speed. You can construct DMatrix from multiple different sources of data.
reg_alpha (alias: alpha ): it is the L1 regularization parameter, increasing its value makes the model more conservative. Default is 0. reg_lambda (alias: lambda ): L2 regularization parameter, increasing its value also makes the model conservative. Default is 1.
The XGBoost objective function used when predicting numerical values is the “reg:squarederror” loss function. “reg:squarederror”: Loss function for regression predictive modeling problems.
According to this xgboost example of implementing Average Precision metric, since the xgb optimizer only minimizes, if you implement a metric that maximizes, you have to add a negative sign (-
) in front of it, like so:
def pr_auc_metric(y_predicted, y_true):
return 'pr_auc', -skmetrics.average_precision_score(y_true.get_label(), y_predicted)
So yours would be:
def xgb_mape(preds, dtrain):
labels = dtrain.get_label()
return('mape', -np.mean(np.abs((labels - preds) / (labels + 1))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With