I'm looking to evaluate test performance of a random forest regressor in Python and, in addition to running cross-validation on the training-set, am wondering if it is appropriate to run some sort of correlation analysis between the predicted Y test results and the actual Y test results?
My possibly oversimplified thinking being that a significant correlation between the two would indicate that the predicted Y's are aligned with the actual test Y's and, as such, predictions are good...
Any alternative suggestions are more than welcome. Thanks.
You can run a correlation analysis it is appropriate, but if the correlation is big it's not always true, that your model is good. You must also take a look at the variation. Also depends on what task are you solving(classification, segmentation, regression e.t.c) you can use metrics to detect how good do you predict. You can find different metrics here http://scikit-learn.org/stable/modules/model_evaluation.html.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With