Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xgboost : The meaning of the base_score parameter

In the documentation of xgboost I read:

base_score [default=0.5] : the initial prediction score of all instances, global bias

What is the meaning of this phrase? Is the base score the prior probability of the Event of Interest in the Dataset? I.e. in a dataset of 1,000 observations with 300 Positives and 700 Negatives the base score would be 0.3?

If not, what it would be?

Your advice will be appreciated.

like image 901
rf7 Avatar asked Dec 01 '17 15:12

rf7


People also ask

What is regularization parameter in XGBoost?

The regularization parameters act directly on the weights: lambda - L2 regularization. This term is a constant that is added to the second derivative (Hessian) of the loss function during gain and weight (prediction) calculations. This parameter can both shift which splits are taken and shrink the weights.

What are the Hyperparameters of XGBoost?

Arguably, there are six (6) hyperparameters for XGBoost that are the most important , which is defined as those with the highest probability of the algorithm yielding the most accurate, unbiased results the quickest without over-fitting: (1) how many sub-trees to train; (2) the maximum tree depth (a regularization ...

What is Reg_lambda in XGBoost?

reg_lambda : L2 regularization term. L2 encourages smaller weights, this approach can be more useful in tree-models where zeroing features might not make much sense. min_child_weight : similar to gamma , as it performs regularization at the splitting step. It is the minimum Hessian weight required to create a new node.

What is Max depth XGBoost?

The maximum depth can be specified in the XGBClassifier and XGBRegressor wrapper classes for XGBoost in the max_depth parameter. This parameter takes an integer value and defaults to a value of 3. model = XGBClassifier(max_depth=3) 1. model = XGBClassifier(max_depth=3)


1 Answers

I think your understanding is correct, in your example the base score could be set to 0.3, or you can simply leave it to be the default 0.5. For highly imbalanced data you can initialize it to a more meaningful base score for an improved learning process. Theoretically, as long as you choose the right learning rate and give it enough steps to train, the starting base score shouldn't affect the result. Look at the author's answer in this issue.

Reference: https://github.com/dmlc/xgboost/issues/799

like image 92
Yue Avatar answered Sep 21 '22 20:09

Yue