I know xgboost need first gradient and second gradient, but anybody else has used "mae" as obj function?

A little bit of theory first, sorry! You asked for the grad and hessian for MAE, however, the MAE is not continuously twice differentiable so trying to calculate the first and second derivatives becomes tricky. Below we can see the "kink" at <code>x=0</code> which prevents the MAE from being continuously differentiable. Moreover, the second derivative is zero at all the points where it is well behaved. In XGBoost, the second derivative is used as a denominator in the leaf weights, and when zero, creates serious math-errors. Given these complexities, our best bet is to try to approximate the MAE using some other, nicely behaved function. Let's take a look. <img src="https://i.stack.imgur.com/vXMgz.png" alt="Some different loss functions"> We can see above that there are several functions that approximate the absolute value. Clearly, for very small values, the Squared Error (MSE) is a fairly good approximation of the MAE. However, I assume that this is not sufficient for your use case. Huber Loss is a well documented loss function. However, it is not smooth so we cannot guarantee smooth derivatives. We can approximate it using the Psuedo-Huber function. It can be implemented in python XGBoost as follows, <pre class="prettyprint"><code>import xgboost as xgb dtrain = xgb.DMatrix(x_train, label=y_train) dtest = xgb.DMatrix(x_test, label=y_test) param = {'max_depth': 5} num_round = 10 def huber_approx_obj(preds, dtrain): d = preds - dtrain.get_labels() #remove .get_labels() for sklearn h = 1 #h is delta in the graphic scale = 1 + (d / h) ** 2 scale_sqrt = np.sqrt(scale) grad = d / scale_sqrt hess = 1 / scale / scale_sqrt return grad, hess bst = xgb.train(param, dtrain, num_round, obj=huber_approx_obj) </code></pre> Other function can be used by replacing the <code>obj=huber_approx_obj</code>. Fair Loss is not well documented at all but it seems to work rather well. The fair loss function is: <img src="https://i.stack.imgur.com/JO6Vp.png" alt="Fair Loss Function"> It can be implemented as such, <pre class="prettyprint"><code>def fair_obj(preds, dtrain): """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)""" x = preds - dtrain.get_labels() c = 1 den = abs(x) + c grad = c*x / den hess = c*c / den ** 2 return grad, hess </code></pre> This code is taken and adapted from the second place solution in the Kaggle Allstate Challenge. Log-Cosh Loss function. <pre class="prettyprint"><code>def log_cosh_obj(preds, dtrain): x = preds - dtrain.get_labels() grad = np.tanh(x) hess = 1 / np.cosh(x)**2 return grad, hess </code></pre> Finally, you can create your own custom loss functions using the above functions as templates.

Xgboost-How to use "mae" as objective function?

1 Answers

A little bit of theory first, sorry! You asked for the grad and hessian for MAE, however, the MAE is not continuously twice differentiable so trying to calculate the first and second derivatives becomes tricky. Below we can see the "kink" at x=0 which prevents the MAE from being continuously differentiable.

Moreover, the second derivative is zero at all the points where it is well behaved. In XGBoost, the second derivative is used as a denominator in the leaf weights, and when zero, creates serious math-errors.

Given these complexities, our best bet is to try to approximate the MAE using some other, nicely behaved function. Let's take a look.

Some different loss functions

We can see above that there are several functions that approximate the absolute value. Clearly, for very small values, the Squared Error (MSE) is a fairly good approximation of the MAE. However, I assume that this is not sufficient for your use case.

Huber Loss is a well documented loss function. However, it is not smooth so we cannot guarantee smooth derivatives. We can approximate it using the Psuedo-Huber function. It can be implemented in python XGBoost as follows,

import xgboost as xgb  dtrain = xgb.DMatrix(x_train, label=y_train) dtest = xgb.DMatrix(x_test, label=y_test)  param = {'max_depth': 5} num_round = 10  def huber_approx_obj(preds, dtrain):     d = preds - dtrain.get_labels() #remove .get_labels() for sklearn     h = 1  #h is delta in the graphic     scale = 1 + (d / h) ** 2     scale_sqrt = np.sqrt(scale)     grad = d / scale_sqrt     hess = 1 / scale / scale_sqrt     return grad, hess  bst = xgb.train(param, dtrain, num_round, obj=huber_approx_obj)

Other function can be used by replacing the obj=huber_approx_obj.

Fair Loss is not well documented at all but it seems to work rather well. The fair loss function is:

Fair Loss Function

It can be implemented as such,

def fair_obj(preds, dtrain):     """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)"""     x = preds - dtrain.get_labels()     c = 1     den = abs(x) + c     grad = c*x / den     hess = c*c / den ** 2     return grad, hess

This code is taken and adapted from the second place solution in the Kaggle Allstate Challenge.

Log-Cosh Loss function.

def log_cosh_obj(preds, dtrain):     x = preds - dtrain.get_labels()     grad = np.tanh(x)     hess = 1 / np.cosh(x)**2     return grad, hess

Finally, you can create your own custom loss functions using the above functions as templates.

174

answered Sep 28 '22 21:09

Little Bobby Tables

Related questions
                            
                                Save and load model optimizer state
                            
                                How training and test data is split - Keras on Tensorflow
                            
                                List of all classification algorithms
                            
                                Algorithm for Hand writing recognition
                            
                                keras: what is the difference between model.predict and model.predict_proba
                            
                                Fast (< n^2) clustering algorithm
                            
                                How to get SVMs to play nicely with missing data in scikit-learn?
                            
                                What are some good machine learning programming exercises? [closed]
                            
                                How to use scikit-learn PCA for features reduction and know which features are discarded
                            
                                K Nearest-Neighbor Algorithm [closed]
                            
                                Machine Learning (tensorflow / sklearn) in Django?
                            
                                ValueError: Output tensors to a Model must be the output of a TensorFlow `Layer`
                            
                                In TensorFlow, what is the argument 'axis' in the function 'tf.one_hot'
                            
                                What is the difference between a Bayesian network and a naive Bayes classifier?
                            
                                Using pre-trained word2vec with LSTM for word generation
                            
                                Linear Regression and Gradient Descent in Scikit learn?
                            
                                Impute entire DataFrame (all columns) using Scikit-learn (sklearn) without iterating over columns
                            
                                PCA projection and reconstruction in scikit-learn
                            
                                Meaning of parameters in torch.nn.conv2d
                            
                                Choosing number of Steps per Epoch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Xgboost-How to use "mae" as objective function?

Tags:

machine-learning

xgboost

Sam Qian

People also ask

1 Answers

Little Bobby Tables

Recent Activity

Donate For Us