I'm having a hard time in finding out what does the oob_score_ means on Random Forest Regressor in scikit-learn. On the documentation it says:
oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.
At first I thought it would return the score for each instance on the set of the out-of-bag instances. But this is given by the attribute:
oob_prediction_ : array of shape = [n_samples] Prediction computed with out-of-bag estimate on the training set.
Which returns an array containing the prediction of each instance. Then analyzing the others parameters on the documentation, I realized that the method score(X, y, sample_weight=None) returns the Coefficient of determination R².
Considering that calling the attribute oob_score_ returns a single float value, what does it represent? If possible, I would like to know as well how it is computed.
The link to the documentation is RandomForestRegressor.
It returns exactly what is said in the documentation
oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.
where score
score(X, y, sample_weight=None) returns the Coefficient of determination R².
and out-of-bag estimate are samples not used for training due to bagging procedure.
Just look at a source, lines 727-740
predictions /= n_predictions
self.oob_prediction_ = predictions
if self.n_outputs_ == 1:
self.oob_prediction_ = \
self.oob_prediction_.reshape((n_samples, ))
self.oob_score_ = 0.0
for k in range(self.n_outputs_):
self.oob_score_ += r2_score(y[:, k],
predictions[:, k])
self.oob_score_ /= self.n_outputs_
In other words it is just R2 score on oob_prediction_
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With