Can someone explain how to use the oob_decision_function_ attribute for the python SciKit Random Forest Classifier? I want to use it to plot learning curves comparing training and validation error against different training set sizes in order to identify overfitting and other problems. Can't seem to find any information about how to do this.
You can pass in a custom scoring function into any of the scoring parameters in the model evaluation fields, it needs to have the signiture classifier, X, y_true -> score.
For your case you could use something like
from sklearn.learning_curve import learning_curve
learning_curve(r, X, y, cv=3, scoring=lambda c,x,y: c.oob_score_)
This will compute 3-fold cross validated oob scores against different training set sizes. Btw I don't think you should get overfitting with random forests, that's one of the benefits of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With