Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the R2 value in Scikit learn calculated?

Tags:

The R^2 value returned by scikit learn (metrics.r2_score()) can be negative. The docs say:

"Unlike most other scores, R² score may be negative (it need not actually be the square of a quantity R)."

However the wikipedia article on R^2 mentions no R (not squared) quantity. Perhaps it uses absolute differences instead of square differences. I really have no idea

like image 880
joeally Avatar asked Apr 26 '14 09:04

joeally


People also ask

What is R2 in Sklearn?

Coefficient of determination also called as R2 score is used to evaluate the performance of a linear regression model.

How is the R2 value calculated?

R 2 = 1 − sum squared regression (SSR) total sum of squares (SST) , = 1 − ∑ ( y i − y i ^ ) 2 ∑ ( y i − y ¯ ) 2 . The sum squared regression is the sum of the residuals squared, and the total sum of squares is the sum of the distance the data is away from the mean all squared.

What is R2 score in machine learning?

What is r2 score? ” …the proportion of the variance in the dependent variable that is predictable from the independent variable(s).” Another definition is “(total variance explained by model) / total variance.” So if it is 100%, the two variables are perfectly correlated, i.e., with no variance at all.

How do you calculate R2 in keras?

The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2). sum() and v is the residual sum of squares ((y_true - y_true.


2 Answers

The R^2 in scikit learn is essentially the same as what is described in the wikipedia article on the coefficient of determination (grep for "the most general definition"). It is 1 - residual sum of square / total sum of squares.

The big difference between a classical stats setting and what you usually try to do with machine learning, is that in machine learning you evaluate your score on unseen data, which can lead to results outside [0,1]. If you apply R^2 to the same data you used to fit your model, it will lie within [0, 1]

See also this very similar question

like image 147
eickenberg Avatar answered Sep 17 '22 19:09

eickenberg


Since R^2 = 1 - RSS/TSS, the only case where RSS/TSS > 1 happens when our model is even worse than the worst model assumed (which is the absolute mean model).

here RSS = sum of squares of difference between actual values(yi) and predicted values(yi^) and TSS = sum of squares of difference between actual values (yi) and mean value (Before applying Regression). So you can imagine TSS representing the best(actual) model, and RSS being in between our best model and the worst absolute mean model in which case we'll get RSS/TSS < 1. If our model is even worse than the worst mean model then in that case RSS > TSS(Since difference between actual observation and mean value < difference predicted value and actual observation).

Check here for better intuition with visual representation: https://ragrawal.wordpress.com/2017/05/06/intuition-behind-r2-and-other-regression-evaluation-metrics/

like image 29
ManiS Avatar answered Sep 17 '22 19:09

ManiS