Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Calculate Test MSE given a trained model from a training set and a test set

Given two simple sets of data:

 head(training_set)
      x         y
    1 1  2.167512
    2 2  4.684017
    3 3  3.702477
    4 4  9.417312
    5 5  9.424831
    6 6 13.090983

 head(test_set)
      x        y
    1 1 2.068663
    2 2 4.162103
    3 3 5.080583
    4 4 8.366680
    5 5 8.344651

I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there.

model = lm(y~x,data=training_set)
train_MSE = mean(model$residuals^2)
test_MSE = ?
like image 341
Jebathon Avatar asked Oct 01 '16 21:10

Jebathon


People also ask

How do you calculate MSE for test data in R?

To calculate the MSE, we need to square the residuals, so that a negative residual has the same contribution to the mean as an equivalent-magnitude positive residual. For example, say the actual value of the response variable was 7, and we predicted 5. The residual is 5 – 7 = –2.

How do you calculate MSE for a model?

To find the MSE, take the observed value, subtract the predicted value, and square that difference. Repeat that for all observations. Then, sum all of those squared values and divide by the number of observations.

What is the difference between training MSE and test MSE?

Training Error versus Test ErrorA smaller MSE means that the estimate is more accurate. It is important to realise that this MSE value is computed using only the training data. That is, it is computed using only the data that the model was fitted on. Hence, it is actually known as the training MSE.


1 Answers

In this case, it is more precise to call it MSPE (mean squared prediction error):

mean((test_set$y - predict.lm(model, test_set)) ^ 2)

This is a more useful measure as all models aim at prediction. We want a model with minimal MSPE.

In practice, if we do have a spare test data set, we can directly compute MSPE as above. However, very often we don't have spare data. In statistics, the leave-one-out cross-validation is an estimate of MSPE from the training dataset.

There are also several other statistics for assessing prediction error, like Mallows's statistic and AIC.

like image 183
Zheyuan Li Avatar answered Sep 27 '22 16:09

Zheyuan Li