Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use SciKit Random Forests's oob_decision_function_ for learning curves?

Can someone explain how to use the oob_decision_function_ attribute for the python SciKit Random Forest Classifier? I want to use it to plot learning curves comparing training and validation error against different training set sizes in order to identify overfitting and other problems. Can't seem to find any information about how to do this.

like image 203
user123959 Avatar asked Dec 13 '25 09:12

user123959


1 Answers

You can pass in a custom scoring function into any of the scoring parameters in the model evaluation fields, it needs to have the signiture classifier, X, y_true -> score.

For your case you could use something like

from sklearn.learning_curve import learning_curve
learning_curve(r, X, y, cv=3, scoring=lambda c,x,y: c.oob_score_)

This will compute 3-fold cross validated oob scores against different training set sizes. Btw I don't think you should get overfitting with random forests, that's one of the benefits of them.

like image 150
maxymoo Avatar answered Dec 15 '25 16:12

maxymoo