I am trying to access the out of bag samples associated with each tree in a RandomForestClassifier with no luck. I found other informations like Gini score and split feature for each node, looking there : https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx
Does anyone know if it is possible to get the out of bag sample related to a tree ? If not maybe it is possible to get the 'in bag' sample (subset of the dataset used for a specific tree) and then compute the OOB using the original data set ?
Thanks in advance
Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from.
Similarly, each of the OOB sample rows is passed through every DT that did not contain the OOB sample row in its bootstrap training data and a majority prediction is noted for each row. And lastly, the OOB score is computed as the number of correctly predicted rows from the out of bag sample.
The random forest algorithm is made up of a collection of decision trees, and each tree in the ensemble is comprised of a data sample drawn from a training set with replacement, called the bootstrap sample.
There's no such thing as good oob_score, its the difference between valid_score and oob_score that matters. Think of oob_score as a score for some subset(say, oob_set) of training set. To learn how its created refer this.
You can just figure this out by yourself from source code, look how private _set_oob_score
method of random forest works. Every tree estimator in scikit-learn has it's own seed for pseudo random number generator, it's stored inside estimator.random_state
field.
During fit procedure every estimator learns on subset of training set, indices for subset of training set will be generated with PRNG and seed from estimator.random_state
.
This should work:
from sklearn.ensemble.forest import _generate_unsampled_indices
# X here - training set of examples
n_samples = X.shape[0]
for tree in rf.estimators_:
# Here at each iteration we obtain out of bag samples for every tree.
unsampled_indices = _generate_unsampled_indices(
tree.random_state, n_samples)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With