How is xgboost cover calculated?

Tags:

xgboost

Could someone explain how the Cover column in the xgboost R package is calculated in the xgb.model.dt.tree function?

In the documentation it says that Cover "is a metric to measure the number of observations affected by the split".

When you run the following code, given in the xgboost documentation for this function, Cover for node 0 of tree 0 is 1628.2500.

data(agaricus.train, package='xgboost')

#Both dataset are list with two items, a sparse matrix and labels
#(labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train

bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)

There are 6513 observations in the train dataset, so can anyone explain why Cover for node 0 of tree 0 is a quarter of this number (1628.25)?

Also, Cover for the node 1 of tree 1 is 788.852 - how is this number calculated?

Any help would be much appreciated. Thanks.

798

asked Nov 04 '15 11:11

Video Answer

1 Answers

Cover is defined in xgboost as:

the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be

https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd Not particularly well documented....

In order to calculate the cover, we need to know the predictions at that point in the tree, and the 2nd derivative with respect to the loss function.

Lucky for us, the prediction for every data point (6513 of them) in the 0-0 node in your example is .5. This is a global default setting whereby your first prediction at t=0 is .5.

base_score [ default=0.5 ] the initial prediction score of all instances, global bias

http://xgboost.readthedocs.org/en/latest/parameter.html

The gradient of binary logistic (which is your objective function) is p-y, where p = your prediction, and y = true label.

Thus, The hessian (which we need for this) is p*(1-p). Note: the Hessian can be determined without y, the true labels.

So (bringing it home) :

6513 * (.5) * (1 - .5) = 1628.25

In the second tree, the predictions at that point are no longer all .5,sp lets get the predictions after one tree

p = predict(bst,newdata = train$data, ntree=1)

head(p)
[1] 0.8471184 0.1544077 0.1544077 0.8471184 0.1255700 0.1544077

sum(p*(1-p))  # sum of the hessians in that node,(root node has all data)
[1] 788.8521

Note , for linear (squared error) regression the hessian is always one, so the cover indicates how many examples are in that leaf.

The big takeaway is that cover is defined by the hessian of the objective function. Lots of info out there in terms of getting to the gradient, and hessian of the binary logistic function.

These slides are helpful is seeing why he uses hessians as a weighting, and also explain how xgboost splits differently from standard trees. https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

185

answered Sep 17 '22 19:09

T. Scharf

Related questions
                            
                                How install R package "udunits2" in Ubuntu
                            
                                'localhost' connection without firewall popup
                            
                                R as a general purpose programming language [closed]
                            
                                passing a string as a data frame column name
                            
                                Ordering 1:17 by perfect square pairs
                            
                                write to csv file using separator
                            
                                R suppressing rownames in grid table
                            
                                Using table caption on R markdown file using knitr to use in pandoc to convert to pdf
                            
                                Strange output from fread when called from knitr
                            
                                skip some rows in read.csv in R
                            
                                How to convert from a list of lists to a list in R retaining names?
                            
                                Greatest distance between set of longitude/latitude points
                            
                                How to change factor labels into string in a data frame
                            
                                How can I remove the prefix (index indicator) [1] in knitr output?
                            
                                Use regex to insert space between collapsed words
                            
                                Vectorize() vs apply()
                            
                                Python equivalent of daisy() in the cluster package of R
                            
                                Keyboard shortcut to produce code chunk brackets in markdown in R for RStudio
                            
                                Adding points from other dataset to ggplot2
                            
                                Unlist a data frame by rows, not columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With