Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confidence intervals for Ridge regression

I can't do the confidence intervals in a ridge regression. I have this model.

model5 <- glmnet(train_x,train_y,family = "gaussian",alpha=0, lambda=0.01)

And when I do the prediction I use these command:

test_pred <- predict(model5, test_x, type = "link")

Someone knows how to do the confidence interval for the predictions?

like image 923
Ana Laura Carreiras Avatar asked Sep 28 '16 14:09

Ana Laura Carreiras


People also ask

What is the formula for ridge regression?

In ridge regression, however, the formula for the hat matrix should include the regularization penalty: Hridge = X(X′X + λI)−1X, which gives dfridge = trHridge, which is no longer equal to m. Some ridge regression software produce information criteria based on the OLS formula.

What are the assumptions of ridge regression?

Assumptions of Ridge Regressions The assumptions of ridge regression are the same as that of linear regression: linearity, constant variance, and independence. However, as ridge regression does not provide confidence limits, the distribution of errors to be normal need not be assumed.

Can ridge regression be used for regression?

Ridge regression is the method used for the analysis of multicollinearity in multiple regression data. It is most suitable when a data set contains a higher number of predictor variables than the number of observations.

Is Ridge better than OLS?

Also studies by [17] who applied the ridge regression method to the unemployment rate in Iraq. the researchers recommended the ridge regression method rather than OLS because it provides a better estimate than OLS when independent variables are related without omitting any of the independent variables.


1 Answers

It turns out that glmnet doesn't offer standard errors (and therefore doesn't give you confidence intervals) as explained here and also addressed in this vignette (excerpt below):

It is a very natural question to ask for standard errors of regression coefficients or other estimated quantities. In principle such standard errors can easily be calculated, e.g. using the bootstrap.

Still, this package deliberately does not provide them. The reason for this is that standard errors are not very meaningful for strongly biased estimates such as arise from penalized estimation methods. Penalized estimation is a procedure that reduces the variance of estimators by introducing substantial bias. The bias of each estimator is therefore a major component of its mean squared error, whereas its variance may contribute only a small part.

Unfortunately, in most applications of penalized regression it is impossible to obtain a sufficiently precise estimate of the bias. Any bootstrap-based calculations can only give an assessment of the variance of the estimates. Reliable estimates of the bias are only available if reliable unbiased estimates are available, which is typically not the case in situations in which penalized estimates are used.

Reporting a standard error of a penalized estimate therefore tells only part of the story. It can give a mistaken impression of great precision, completely ignoring the inaccuracy caused by the bias. It is certainly a mistake to make confidence statements that are only based on an assessment of the variance of the estimates, such as bootstrap-based confidence intervals do.

Reliable confidence intervals around the penalized estimates can be obtained in the case of low dimensional models using the standard generalized linear model theory as implemented in lm, glm and coxph. Methods for constructing reliable confidence intervals in the high-dimensional situation are, to my knowledge, not available.

However, if you insist on confidence intervals, check out this post.

like image 137
ilanman Avatar answered Oct 06 '22 02:10

ilanman