Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pure RMSE vs Sklearn

I believe I'm making an error in my calculation of RMSE in pure python. Below is code.

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/3)

This returns: 0.707

However when I try with Sklearn

mean_squared_error(np.matrix(y_true),np.matrix(y_pred))**0.5
This returns: 0.612

Any idea what is going on? Pretty sure the my python code is correct.

like image 427
cloud36 Avatar asked Oct 20 '25 16:10

cloud36


2 Answers

You're not making an error. You're dividing by 3 and sklearn is dividing by 4

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/4)

0.61237243569579447

Dividing by n-1 gives you an unbiased estimation and is used when calculating 2nd moments for samples. When calculating these same moments for populations, we divide by n. Here is are links that could be relevant Wikipedia Some other link

like image 126
piRSquared Avatar answered Oct 23 '25 05:10

piRSquared


The right formula of the RMSE is :

RMSE

Or in your case, n=len(y_pred)=len(y_true)=4. So in order to have the right result, change np.sqrt(ee.sum()/3) to np.sqrt(ee.sum()/len(y_pred))

like image 25
MMF Avatar answered Oct 23 '25 06:10

MMF