Could someone explain how the Cover
column in the xgboost
R package is calculated in the xgb.model.dt.tree
function?
In the documentation it says that Cover "is a metric to measure the number of observations affected by the split".
When you run the following code, given in the xgboost
documentation for this function, Cover
for node 0 of tree 0 is 1628.2500.
data(agaricus.train, package='xgboost')
#Both dataset are list with two items, a sparse matrix and labels
#(labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)
There are 6513 observations in the train dataset, so can anyone explain why Cover
for node 0 of tree 0 is a quarter of this number (1628.25)?
Also, Cover
for the node 1 of tree 1 is 788.852 - how is this number calculated?
Any help would be much appreciated. Thanks.
Cover is defined in xgboost as: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch.
XGBoost can increase the model's accuracy score by using the best parameters during prediction. After initializing XGBoost, we can use it to train our model. Once again, we use the training set. The model learns from this dataset, stores the knowledge gained in memory, and uses this knowledge when making predictions.
It avoids overfitting by attempting to automatically select the inflection point where performance on the test dataset starts to decrease while performance on the training dataset continues to improve as the model starts to overfit.
Gain for XGBoost is influenced by the count of the number of samples affected by the splits based on a feature (Figure 2A), for LightGBM the total gain of splits which use the feature is summed (Figure 2B), while for CatBoost gain values show for each feature, how much on average the prediction changes if the feature ...
Cover is defined in xgboost
as:
the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be
https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd Not particularly well documented....
In order to calculate the cover, we need to know the predictions at that point in the tree, and the 2nd derivative with respect to the loss function.
Lucky for us, the prediction for every data point (6513 of them) in the 0-0 node in your example is .5. This is a global default setting whereby your first prediction at t=0 is .5.
base_score [ default=0.5 ] the initial prediction score of all instances, global bias
http://xgboost.readthedocs.org/en/latest/parameter.html
The gradient of binary logistic (which is your objective function) is p-y, where p = your prediction, and y = true label.
Thus, The hessian (which we need for this) is p*(1-p). Note: the Hessian can be determined without y, the true labels.
So (bringing it home) :
6513 * (.5) * (1 - .5) = 1628.25
In the second tree, the predictions at that point are no longer all .5,sp lets get the predictions after one tree
p = predict(bst,newdata = train$data, ntree=1)
head(p)
[1] 0.8471184 0.1544077 0.1544077 0.8471184 0.1255700 0.1544077
sum(p*(1-p)) # sum of the hessians in that node,(root node has all data)
[1] 788.8521
Note , for linear (squared error) regression the hessian is always one, so the cover indicates how many examples are in that leaf.
The big takeaway is that cover is defined by the hessian of the objective function. Lots of info out there in terms of getting to the gradient, and hessian of the binary logistic function.
These slides are helpful is seeing why he uses hessians as a weighting, and also explain how xgboost
splits differently from standard trees. https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With