I believe I'm making an error in my calculation of RMSE in pure python. Below is code.
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/3)
This returns: 0.707
However when I try with Sklearn
mean_squared_error(np.matrix(y_true),np.matrix(y_pred))**0.5
This returns: 0.612
Any idea what is going on? Pretty sure the my python code is correct.
You're not making an error. You're dividing by 3 and sklearn is dividing by 4
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/4)
0.61237243569579447
Dividing by n-1 gives you an unbiased estimation and is used when calculating 2nd moments for samples. When calculating these same moments for populations, we divide by n. Here is are links that could be relevant Wikipedia Some other link
The right formula of the RMSE is :

Or in your case, n=len(y_pred)=len(y_true)=4.
So in order to have the right result, change np.sqrt(ee.sum()/3) to np.sqrt(ee.sum()/len(y_pred))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With