This is in reference to understanding, internally, how the probabilities for a class are predicted using LightGBM
.
Other packages, like sklearn
, provide thorough detail for their classifiers. For example:
LogisticRegression
returns:Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
RandomForest
returns:Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
There are additional Stack Overflow questions which provide additional details, such as for:
Support Vector Machines
Multilayer Perceptron
I am trying to uncover those same details for LightGBM's predict_proba
function. The documentation does not list the details of how the probabilities are calculated.
The documentation simply states:
Return the predicted probability for each class for each sample.
The source code is below:
def predict_proba(self, X, raw_score=False, start_iteration=0, num_iteration=None,
pred_leaf=False, pred_contrib=False, **kwargs):
"""Return the predicted probability for each class for each sample.
Parameters
----------
X : array-like or sparse matrix of shape = [n_samples, n_features]
Input features matrix.
raw_score : bool, optional (default=False)
Whether to predict raw scores.
start_iteration : int, optional (default=0)
Start index of the iteration to predict.
If <= 0, starts from the first iteration.
num_iteration : int or None, optional (default=None)
Total number of iterations used in the prediction.
If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
otherwise, all iterations from ``start_iteration`` are used (no limits).
If <= 0, all iterations from ``start_iteration`` are used (no limits).
pred_leaf : bool, optional (default=False)
Whether to predict leaf index.
pred_contrib : bool, optional (default=False)
Whether to predict feature contributions.
.. note::
If you want to get more explanations for your model's predictions using SHAP values,
like SHAP interaction values,
you can install the shap package (https://github.com/slundberg/shap).
Note that unlike the shap package, with ``pred_contrib`` we return a matrix with an extra
column, where the last column is the expected value.
**kwargs
Other parameters for the prediction.
Returns
-------
predicted_probability : array-like of shape = [n_samples, n_classes]
The predicted probability for each class for each sample.
X_leaves : array-like of shape = [n_samples, n_trees * n_classes]
If ``pred_leaf=True``, the predicted leaf of every tree for each sample.
X_SHAP_values : array-like of shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
If ``pred_contrib=True``, the feature contributions for each sample.
"""
result = super(LGBMClassifier, self).predict(X, raw_score, start_iteration, num_iteration,
pred_leaf, pred_contrib, **kwargs)
if callable(self._objective) and not (raw_score or pred_leaf or pred_contrib):
warnings.warn("Cannot compute class probabilities or labels "
"due to the usage of customized objective function.\n"
"Returning raw scores instead.")
return result
elif self._n_classes > 2 or raw_score or pred_leaf or pred_contrib:
return result
else:
return np.vstack((1. - result, result)).transpose()
How can I understand how exactly the predict_proba
function for LightGBM
is working internally?
LightGBM, like all gradient boosting methods for classification, essentially combines decision trees and logistic regression. We start with the same logistic function representing the probabilities (a.k.a. softmax):
P(y = 1 | X) = 1/(1 + exp(Xw))
The interesting twist is that the feature matrix X
is composed from the terminal nodes from a decision tree ensemble. These are all then weighted by w
, a parameter that must be learned. The mechanism used to learn the weights depends on the precise learning algorithm used. Similarly, the construction of X also depends on the algorithm. LightGBM, for example, introduced two novel features which won them the performance improvements over XGBoost: "Gradient-based One-Side Sampling" and "Exclusive Feature Bundling". Generally though, each row collects the terminal leafs for each sample and the columns represent the terminal leafs.
So here is what the docs could say...
Probability estimates.
The predicted class probabilities of an input sample are computed as the softmax of the weighted terminal leaves from the decision tree ensemble corresponding to the provided sample.
For further details, you'd have to delve into the details of boosting, XGBoost, and finally the LightGBM paper, but that seems a bit heavy handed given the other documentation examples you've given.
Below we can see an illustration of what each method is calling under the hood. First, the predict_proba()
method of the class LGBMClassifier
is calling the predict()
method from LGBMModel
(it inherits from it).
LGBMClassifier.predict_proba() (inherits from LGBMModel)
|---->LGBMModel().predict() (calls LightGBM Booster)
|---->Booster.predict()
Then, it calls the predict()
method from the LightGBM Booster (the Booster
class). In order to call this method, the Booster should be trained first.
Basically, the Booster
is the one that generates the predicted value for each sample by calling it's predict()
method. See below, for a detailed follow up of how this booster works.
We seek to answer the question how does LightGBM booster works?. By going through the Python code we can get a general idea of how it is trained and updated. But, there are some further references to the C++ libraries of LightGBM that I'm not in a position to explain. However, a general glimpse of LightGBM's Booster workflow is explained.
The _Booster
of LGBMModel
is initialized by calling the train()
function, on line 595 of sklearn.py we see the following code
self._Booster = train(params, train_set,
self.n_estimators, valid_sets=valid_sets, valid_names=eval_names,
early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, fobj=self._fobj, feval=feval,
verbose_eval=verbose, feature_name=feature_name,
callbacks=callbacks, init_model=init_model)
Note.
train()
comes from engine.py.
Inside train()
we see that the Booster is initialized (line 231)
# construct booster
try:
booster = Booster(params=params, train_set=train_set)
...
and updated at every training iteration (line 242).
for i in range_(init_iteration, init_iteration + num_boost_round):
...
...
booster.update(fobj=fobj)
...
booster.update()
works?To understand how the update()
method works we should go to line 2315 of basic.py. Here, we see that this function updates the Booster for one iteration.
There two alternatives to update the booster, depending on wether or not you provide an objective function.
None
On line 2367 we get to the following code
if fobj is None:
...
...
_safe_call(_LIB.LGBM_BoosterUpdateOneIter(
self.handle,
ctypes.byref(is_finished)))
self.__is_predicted_cur_iter = [False for _ in range_(self.__num_dataset)]
return is_finished.value == 1
notice that as the objective function (fobj
) is not provided it updates the booster by calling LGBM_BoosterUpdateOneIter
from _LIB
. For short, _LIB
are the loaded C++
LightGBM libraries.
What is
_LIB
?
_LIB
is a variable that stores the loaded LightGBM library by calling_load_lib()
(line 29 of basic.py).Then
_load_lib()
loads the LightGBM library by finding on your system the path tolib_lightgbm.dll
(Windows) orlib_lightgbm.so
(Linux).
When a custom object function is encountered, we get to the following case
else:
...
...
grad, hess = fobj(self.__inner_predict(0), self.train_set)
where __inner_predict()
is a method from LightGBM's Booster (see line 1930 from basic.py for more details of the Booster
class), which predicts for training and validation data. Inside __inner_predict()
(line 3142 of basic.py) we see that it calls LGBM_BoosterGetPredict
from _LIB
to get the predictions, that is,
_safe_call(_LIB.LGBM_BoosterGetPredict(
self.handle,
ctypes.c_int(data_idx),
ctypes.byref(tmp_out_len),
data_ptr))
Finally, after updating range_(init_iteration, init_iteration + num_boost_round)
times the booster it will be trained. Thus, Booster.predict()
can be called by LightGBMClassifier.predict_proba()
.
Note. The booster is trained as part of the model fitting step, especifically by
LGBMModel.fit()
, see line 595 of sklearn.py for code details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With