Could someone explain how the Quality
column in the xgboost R package is calculated in the xgb.model.dt.tree
function?
In the documentation it says that Quality
"is the gain related to the split in this specific node".
When you run the following code, given in the xgboost documentation for this function, Quality
for node 0 of tree 0 is 4000.53, yet I calculate the Gain
as 2002.848
data(agaricus.train, package='xgboost')
train <- agarics.train
X = train$data
y = train$label
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)
p = rep(0.5,nrow(X))
L = which(X[,'odor=none']==0)
R = which(X[,'odor=none']==1)
pL = p[L]
pR = p[R]
yL = y[L]
yR = y[R]
GL = sum(pL-yL)
GR = sum(pR-yR)
G = sum(p-y)
HL = sum(pL*(1-pL))
HR = sum(pR*(1-pR))
H = sum(p*(1-p))
gain = 0.5 * (GL^2/HL+GR^2/HR-G^2/H)
gain
I understand that Gain
is given by the following formula:
Since we are using log loss, G is the sum of p-y
and H is the sum of p(1-p)
- gamma and lambda in this instance are both zero.
Can anyone identify where I am going wrong?
OK, I think I've worked it out. The value for reg_lambda
is not 0 by default as given in the documentation, but is actually 1 (from param.h)
Also, it appears that the factor of a half is not applied when calculating the gain, so the Quality column is double what you would expect. Lastly, I also don't think gamma
(also called min_split_loss
) is applied to this calculation either (from update_hitmaker-inl.hpp)
Instead, gamma is used to determine whether to invoke pruning, but is not reflected in the gain calculation itself, as the documentation suggests.
If you apply these changes, you do indeed get 4000.53 as the Quality
for node 0 of tree 0, as in the original question. I'll raise this as an issue to the xgboost guys, so the documentation can be changed accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With