I know xgboost need first gradient and second gradient, but anybody else has used "mae" as obj function?
The XGBoost objective function used when predicting numerical values is the “reg:squarederror” loss function. “reg:squarederror”: Loss function for regression predictive modeling problems.
XGBoost uses a popular metric called 'log loss' just like most other gradient boosting algorithms. This probability-based metric is used to measure the performance of a classification model. However, it is necessary to understand the mathematics behind the same before we start using it to evaluate our model.
XGBoost Python api provides a method to assess the incremental performance by the incremental number of trees. It uses two arguments: “eval_set” — usually Train and Test sets — and the associated “eval_metric” to measure your error on these evaluation sets.
A little bit of theory first, sorry! You asked for the grad and hessian for MAE, however, the MAE is not continuously twice differentiable so trying to calculate the first and second derivatives becomes tricky. Below we can see the "kink" at x=0
which prevents the MAE from being continuously differentiable.
Moreover, the second derivative is zero at all the points where it is well behaved. In XGBoost, the second derivative is used as a denominator in the leaf weights, and when zero, creates serious math-errors.
Given these complexities, our best bet is to try to approximate the MAE using some other, nicely behaved function. Let's take a look.
We can see above that there are several functions that approximate the absolute value. Clearly, for very small values, the Squared Error (MSE) is a fairly good approximation of the MAE. However, I assume that this is not sufficient for your use case.
Huber Loss is a well documented loss function. However, it is not smooth so we cannot guarantee smooth derivatives. We can approximate it using the Psuedo-Huber function. It can be implemented in python XGBoost as follows,
import xgboost as xgb dtrain = xgb.DMatrix(x_train, label=y_train) dtest = xgb.DMatrix(x_test, label=y_test) param = {'max_depth': 5} num_round = 10 def huber_approx_obj(preds, dtrain): d = preds - dtrain.get_labels() #remove .get_labels() for sklearn h = 1 #h is delta in the graphic scale = 1 + (d / h) ** 2 scale_sqrt = np.sqrt(scale) grad = d / scale_sqrt hess = 1 / scale / scale_sqrt return grad, hess bst = xgb.train(param, dtrain, num_round, obj=huber_approx_obj)
Other function can be used by replacing the obj=huber_approx_obj
.
Fair Loss is not well documented at all but it seems to work rather well. The fair loss function is:
It can be implemented as such,
def fair_obj(preds, dtrain): """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)""" x = preds - dtrain.get_labels() c = 1 den = abs(x) + c grad = c*x / den hess = c*c / den ** 2 return grad, hess
This code is taken and adapted from the second place solution in the Kaggle Allstate Challenge.
Log-Cosh Loss function.
def log_cosh_obj(preds, dtrain): x = preds - dtrain.get_labels() grad = np.tanh(x) hess = 1 / np.cosh(x)**2 return grad, hess
Finally, you can create your own custom loss functions using the above functions as templates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With