Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn r2_score and python stats lineregress function give very different values of R^2. Why?

I´m using the same data but different python libraries to calculate the coefficient of determination R^2. Using stats library and sklearn yield different results.

What is the reason behind this behavior?

# Using stats lineregress
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print r_value**2

0.956590054918

# Using sklearn
from sklearn.metrics import r2_score
print r2_score(x, y)

0.603933484937

like image 251
Pablo Fleurquin Avatar asked Mar 22 '16 11:03

Pablo Fleurquin


People also ask

How to calculate R2 score of baseline model in Python?

We can import r2_score from sklearn.metrics in Python to compute R 2 score. Code 2: Calculate R2 score for all the above cases. The best possible score is 1 which is obtained when the predicted values are the same as the actual values. R 2 score of baseline model is 0. During the worse cases, R2 score can even be negative.

What is a good R^2 score in sklearn?

sklearn.metrics.r2_score¶. R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

What is the R^2 score of a linear regression model?

sklearn.metrics. .r2_score. ¶. R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

What is r2_score in Python?

R 2 indicates the proportion of data points which lie within the line created by the regression equation. A higher value of R 2 is desirable as it indicates better results. We can import r2_score from sklearn.metrics in Python to compute R 2 score.


Video Answer


1 Answers

The r_value returned by linregress is the correlation coefficient r of x and y. In general, the squared correlation coefficient is not the same as the coefficient of determination .

The coefficient of determination tells you how well a model fits the data. Thus, r2_score thinks that x are the true values and y are values predicted by a model.

If your x and y are true and predicted data, is what you want. However, if both are measured data you most likely want instead.

Details about the correlation coefficient and the coefficient of determination can be found at Wikipedia.

like image 73
MB-F Avatar answered Oct 06 '22 00:10

MB-F