Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evaluate Random Forest performance using R-Squared

I'm looking to evaluate test performance of a random forest regressor in Python and, in addition to running cross-validation on the training-set, am wondering if it is appropriate to run some sort of correlation analysis between the predicted Y test results and the actual Y test results?

My possibly oversimplified thinking being that a significant correlation between the two would indicate that the predicted Y's are aligned with the actual test Y's and, as such, predictions are good...

Any alternative suggestions are more than welcome. Thanks.

like image 245
cookie1986 Avatar asked May 18 '26 01:05

cookie1986


1 Answers

You can run a correlation analysis it is appropriate, but if the correlation is big it's not always true, that your model is good. You must also take a look at the variation. Also depends on what task are you solving(classification, segmentation, regression e.t.c) you can use metrics to detect how good do you predict. You can find different metrics here http://scikit-learn.org/stable/modules/model_evaluation.html.

like image 151
Vage Egiazarian Avatar answered May 19 '26 13:05

Vage Egiazarian