Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pred_leaf in lightgbm

While going through the LightGBM docs I found that predict supports a pred_leaf argument. The docs say

pred_leaf (bool, optional (default=False)) – Whether to predict
leaf index.

However, when doing a

data := (1, 28)
gbm := num_boost_round = X

embedding = gbm.predict(data, pred_leaf=True)
embedding.shape  # [1, X]
print(embedding[0, :])  # [29,  2,  8, 26,  2,  2, 16, 18, 25, 30, 16, 25,  0, 17, 15]

I don't understand why it is outputting an array that is filled as opposed to a one-hot vector or a scalar value? It says it predicts the leaf index? Can this be used as an "embedding" to another model?

Ps: I'd post this in stats-stackexchange but it looks like this is 1) specific to lightgbm and 2) they don't have a lightgbm tag

like image 299
IanQ Avatar asked Sep 03 '25 10:09

IanQ


1 Answers

The output of LightGBM predict with pred_leaf argument set to True is an array of shape(nsample, ntrees) containing int32 values.

Each integer entry in the matrix indicates the predicted leaf index of each sample in each tree.

Since the leaf index of a tree is unique per tree, you may find the same leaf number in many different columns.

As for as its behaviour, this LightGBM prediction function mimicks an analogous one present in XGBoost (https://xgboost.readthedocs.io/en/latest/python/python_api.html).

like image 99
Luca Massaron Avatar answered Sep 05 '25 00:09

Luca Massaron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!