Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R^2 (coefficient of deternimation) calculation using numpy and sklearn are giving different results

I need to calculate the coefficient of determination for a linear regression model.

And I got a strange thing, result of calculation using definition and numpy functions differs to sklearn.metrics.r2_score result. This code presents the difference :

import numpy as np
from sklearn.metrics import r2_score

y_true = np.array([2, -0.5, 2.5, 3, 0])
y_pred = np.array([2.5, 0.0, 3, 8, 0])

r2_score(y_true, y_pred)

>>> -1.6546391752577323
def my_r2_score(y_true, y_pred):
    return 1 - np.sum((y_true - y_pred) ** 2) / np.sum((np.average(y_true) - y_true) ** 2)

def my_r2_score_var(y_true, y_pred):
    return 1 - np.var(y_true - y_pred) / np.var(y_true)

print(my_r2_score(y_true, y_pred))
print(my_r2_score_var(y_true, y_pred))

>>>-1.6546391752577323
>>>-0.7835051546391754

Can any body explain this difference ?

like image 327
123123roma Avatar asked Jan 20 '26 13:01

123123roma


1 Answers

my_r2_score_var is wrong, because np.sum((y_true - y_pred) ** 2)/5 is not equal to np.var(y_true - y_pred).

>>> np.sum((y_true - y_pred) ** 2)/5
5.15
>>> np.var(y_true - y_pred)
3.46

What you are doing with np.var(y_true - y_pred) is:

>>> np.sum(((y_true - y_pred) - np.average(y_true - y_pred))**2)/5
3.46

np.sum((y_true - y_pred) ** 2) is the correct RSS.

You assumed np.var(y_true - y_pred) to be the mean RSS (RSS/5 here), but it isn't.

However, np.var(y_true) happens to be the mean TSS. So you got the RSS part of the 1 - RSS/TSS formula wrong.

like image 59
timgeb Avatar answered Jan 22 '26 03:01

timgeb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!