Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain RMSE out of lm result?

I know there is a small difference between $sigma and the concept of root mean squared error. So, i am wondering what is the easiest way to obtain RMSE out of lm function in R?

res<-lm(randomData$price ~randomData$carat+
                     randomData$cut+randomData$color+
                     randomData$clarity+randomData$depth+
                     randomData$table+randomData$x+
                     randomData$y+randomData$z)

length(coefficients(res))

contains 24 coefficient, and I cannot make my model manually anymore. So, how can I evaluate the RMSE based on coefficients derived from lm?

like image 926
Jeff Avatar asked Mar 30 '17 16:03

Jeff


People also ask

How do you calculate the RMSE of a linear regression?

The RMSE estimates the deviation of the actual y-values from the regression line. Another way to say this is that it estimates the standard deviation of the y-values in a thin vertical rectangle. where ei = yi - yi^. The RMSE can be computed more simply as RMSE = SDy √(1 - r2).

How do you get RMSE of a model?

To compute RMSE, calculate the residual (difference between prediction and truth) for each data point, compute the norm of residual for each data point, compute the mean of residuals and take the square root of that mean.

How do you find RMSE and MSE in a linear regression model?

Root Mean Squared Error (RMSE)RMSE is computed by taking the square root of MSE. RMSE is also called the Root Mean Square Deviation. It measures the average magnitude of the errors and is concerned with the deviations from the actual value. RMSE value with zero indicates that the model has a perfect fit.


1 Answers

Residual sum of squares:

RSS <- c(crossprod(res$residuals))

Mean squared error:

MSE <- RSS / length(res$residuals)

Root MSE:

RMSE <- sqrt(MSE)

Pearson estimated residual variance (as returned by summary.lm):

sig2 <- RSS / res$df.residual

Statistically, MSE is the maximum likelihood estimator of residual variance, but is biased (downward). The Pearson one is the restricted maximum likelihood estimator of residual variance, which is unbiased.


Remark

  • Given two vectors x and y, c(crossprod(x, y)) is equivalent to sum(x * y) but much faster. c(crossprod(x)) is likewise faster than sum(x ^ 2).
  • sum(x) / length(x) is also faster than mean(x).
like image 82
Zheyuan Li Avatar answered Oct 05 '22 19:10

Zheyuan Li