Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple command to do leave-one-out cross validation with the lm() function?

Is there a simple command to do leave-one-out cross validation with the lm() function in R?

Specifically is there a simple command which for the code below?

x <- rnorm(1000,3,2)
y <- 2*x + rnorm(1000)

pred_error_sq <- c(0)
for(i in 1:1000) {
  x_i <- x[-i]
  y_i <- y[-i]
  mdl <- lm(y_i ~ x_i) # leave i'th observation out
  y_pred <- predict(mdl, data.frame(x_i = x[i])) # predict i'th observation
  pred_error_sq <- pred_error_sq + (y[i] - y_pred)^2 # cumulate squared prediction errors
}

y_squared <- sum((y-mean(y))^2)/100 # Variation of the data

R_squared <- 1 - (pred_error_sq/y_squared) # Measure for goodness of fit
like image 334
stollenm Avatar asked Oct 31 '17 09:10

stollenm


People also ask

How do I use leave one out cross-validation in R?

Leave one out cross validation - LOOCV This method works as follow: Leave out one data point and build the model on the rest of the data set. Test the model against the data point that is left out at step 1 and record the test error associated with the prediction. Repeat the process for all data points.

How is leave one out cross-validation calculated?

The leave-one-out cross-validation statistic is given by CV = 1 N ∑ i = 1 N e [ i ] 2 , where e [ i ] = y i − y ^ [ i ] , the observations are given by y 1 , … , y N , and is the predicted value obtained when the model is estimated with the case deleted.

How does CV GLM () function work?

The cv. glm() function produces a list with several components. The two numbers in the delta vector contain the cross-validation results. In this case the numbers are identical (up to two decimal places) and correspond to the LOOCV statistic: our cross-validation estimate for the test error is approximately 24.23.


1 Answers

Another solution is using caret

library(caret)

data <- data.frame(x = rnorm(1000, 3, 2), y = 2*x + rnorm(1000))

train(y ~ x, method = "lm", data = data, trControl = trainControl(method = "LOOCV"))

Linear Regression

1000 samples 1 predictor

No pre-processing Resampling: Leave-One-Out Cross-Validation Summary of sample sizes: 999, 999, 999, 999, 999, 999, ... Resampling results:

RMSE Rsquared MAE
1.050268 0.940619 0.836808

Tuning parameter 'intercept' was held constant at a value of TRUE

like image 156
amarchin Avatar answered Oct 09 '22 09:10

amarchin