Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xgboost-How to use "mae" as objective function?

I know xgboost need first gradient and second gradient, but anybody else has used "mae" as obj function?

like image 607
Sam Qian Avatar asked Jul 10 '17 07:07

Sam Qian


People also ask

What is objective function in XGBoost?

The XGBoost objective function used when predicting numerical values is the “reg:squarederror” loss function. “reg:squarederror”: Loss function for regression predictive modeling problems.

What loss function does XGBoost use?

XGBoost uses a popular metric called 'log loss' just like most other gradient boosting algorithms. This probability-based metric is used to measure the performance of a classification model. However, it is necessary to understand the mathematics behind the same before we start using it to evaluate our model.

What does Eval_metric do in XGBoost?

XGBoost Python api provides a method to assess the incremental performance by the incremental number of trees. It uses two arguments: “eval_set” — usually Train and Test sets — and the associated “eval_metric” to measure your error on these evaluation sets.


1 Answers

A little bit of theory first, sorry! You asked for the grad and hessian for MAE, however, the MAE is not continuously twice differentiable so trying to calculate the first and second derivatives becomes tricky. Below we can see the "kink" at x=0 which prevents the MAE from being continuously differentiable.

Moreover, the second derivative is zero at all the points where it is well behaved. In XGBoost, the second derivative is used as a denominator in the leaf weights, and when zero, creates serious math-errors.

Given these complexities, our best bet is to try to approximate the MAE using some other, nicely behaved function. Let's take a look.

Some different loss functions

We can see above that there are several functions that approximate the absolute value. Clearly, for very small values, the Squared Error (MSE) is a fairly good approximation of the MAE. However, I assume that this is not sufficient for your use case.

Huber Loss is a well documented loss function. However, it is not smooth so we cannot guarantee smooth derivatives. We can approximate it using the Psuedo-Huber function. It can be implemented in python XGBoost as follows,

import xgboost as xgb  dtrain = xgb.DMatrix(x_train, label=y_train) dtest = xgb.DMatrix(x_test, label=y_test)  param = {'max_depth': 5} num_round = 10  def huber_approx_obj(preds, dtrain):     d = preds - dtrain.get_labels() #remove .get_labels() for sklearn     h = 1  #h is delta in the graphic     scale = 1 + (d / h) ** 2     scale_sqrt = np.sqrt(scale)     grad = d / scale_sqrt     hess = 1 / scale / scale_sqrt     return grad, hess  bst = xgb.train(param, dtrain, num_round, obj=huber_approx_obj)   

Other function can be used by replacing the obj=huber_approx_obj.

Fair Loss is not well documented at all but it seems to work rather well. The fair loss function is:

Fair Loss Function

It can be implemented as such,

def fair_obj(preds, dtrain):     """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)"""     x = preds - dtrain.get_labels()     c = 1     den = abs(x) + c     grad = c*x / den     hess = c*c / den ** 2     return grad, hess 

This code is taken and adapted from the second place solution in the Kaggle Allstate Challenge.

Log-Cosh Loss function.

def log_cosh_obj(preds, dtrain):     x = preds - dtrain.get_labels()     grad = np.tanh(x)     hess = 1 / np.cosh(x)**2     return grad, hess 

Finally, you can create your own custom loss functions using the above functions as templates.

like image 174
Little Bobby Tables Avatar answered Sep 28 '22 21:09

Little Bobby Tables