R - Calculate Test MSE given a trained model from a training set and a test set

Tags:

Given two simple sets of data:

 head(training_set)
      x         y
    1 1  2.167512
    2 2  4.684017
    3 3  3.702477
    4 4  9.417312
    5 5  9.424831
    6 6 13.090983

 head(test_set)
      x        y
    1 1 2.068663
    2 2 4.162103
    3 3 5.080583
    4 4 8.366680
    5 5 8.344651

I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there.

model = lm(y~x,data=training_set)
train_MSE = mean(model$residuals^2)
test_MSE = ?

341

asked Oct 01 '16 21:10

Jebathon

1 Answers

In this case, it is more precise to call it MSPE (mean squared prediction error):

mean((test_set$y - predict.lm(model, test_set)) ^ 2)

This is a more useful measure as all models aim at prediction. We want a model with minimal MSPE.

In practice, if we do have a spare test data set, we can directly compute MSPE as above. However, very often we don't have spare data. In statistics, the leave-one-out cross-validation is an estimate of MSPE from the training dataset.

There are also several other statistics for assessing prediction error, like Mallows's statistic and AIC.

183

answered Sep 27 '22 16:09

Zheyuan Li

Related questions
                            
                                unable to install R ggmap package: compilation failed for package ‘jpeg’
                            
                                swimmer survival plot
                            
                                R: Pass data.frame by reference to a function
                            
                                Rstudio-server unable to connect to service
                            
                                How to visualize pairwise comparisons with `ggplot2`?
                            
                                Different pages in Shiny App
                            
                                Dynamic selectInput in R shiny
                            
                                Split character string multiple times every two characters
                            
                                How to use the for loop with function needing for a string field?
                            
                                Error: nrow(x) == n is not TRUE when using Train in Caret
                            
                                R caret: Maximizing sensitivity for manually defined positive class for training (classification),
                            
                                data.table and pmin with na.rm=TRUE argument
                            
                                R Shiny - Audio Playback
                            
                                Create two R functions with same name but different type of argument
                            
                                ggplot alpha levels appear different on fill and border of points (ringing artefact)
                            
                                Rscript: command not found
                            
                                Avoiding hoizontal lines and crazy shapes when plotting maps in ggplot2
                            
                                Installing rpy2 for Python 3 using pip
                            
                                Why is split inefficient on large data frames with many groups?
                            
                                Convert sequence of integers 1, 2, 3, ... to corresponding sequence of strings A, B, C,

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R - Calculate Test MSE given a trained model from a training set and a test set

Tags:

r

machine-learning

statistics

linear-regression

regression

Jebathon

People also ask

1 Answers

Zheyuan Li

Recent Activity

Donate For Us