How to use SciKit Random Forests's oob_decision_function_ for learning curves?

Question

Can someone explain how to use the oob_decision_function_ attribute for the python SciKit Random Forest Classifier? I want to use it to plot learning curves comparing training and validation error against different training set sizes in order to identify overfitting and other problems. Can't seem to find any information about how to do this.

maxymoo · Accepted Answer

You can pass in a custom scoring function into any of the scoring parameters in the model evaluation fields, it needs to have the signiture classifier, X, y_true -> score.

For your case you could use something like

from sklearn.learning_curve import learning_curve
learning_curve(r, X, y, cv=3, scoring=lambda c,x,y: c.oob_score_)

This will compute 3-fold cross validated oob scores against different training set sizes. Btw I don't think you should get overfitting with random forests, that's one of the benefits of them.

How to use SciKit Random Forests's oob_decision_function_ for learning curves?

Tags:

python

scikit-learn

random-forest

user123959

1 Answers

maxymoo

Recent Activity

Donate For Us

How to use SciKit Random Forests's oob_decision_function_ for learning curves?

Tags:

python

scikit-learn

random-forest

user123959

1 Answers

maxymoo

Related questions

Recent Activity

Donate For Us