Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cv.glmnet vs glmnet results; gauging explanatory power

Tags:

r

glmnet

When estimating a lasso model via the glmnet package, I am wondering whether it is better to: (a) pull coefficients / predictions / deviance straight from the cv.fit object procured from cv.glmnet, or (b) use the minimum lambda from cv.glmnet to re-run glmnet and pull these objects from the glmnet process. (Please be patient -- I have a feeling that this is documented, but I'm seeing examples/tutorials of both online, and no solid logic for going one way or the other.)

That is, for coefficients, I can run (a):

cvfit = cv.glmnet(x=xtrain, y=ytrain, alpha=1, type.measure = "mse", nfolds = 20)
coef.cv <- coef(cvfit, s = "lambda.min")

Or I can afterwards run (b):

fit = glmnet(x=xtrain, y=ytrain, alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")

While these two processes select the same model variables, they do not produce identical coefficients. Similarly, I could predict via either of the following two processes:

prdct <- predict(fit,newx=xtest)
prdct.cv <- predict(cvfit, newx=xtest, s = "lambda.min")

And they predict similar but NOT identical vectors.

Last, I would have THOUGHT I could pull % deviance explained via either of the two methods:

percdev <- fit$dev.ratio
percdev.cv <- cvfit$glmnet.fit$dev.ratio[cvfit$cvm==mse.min.cereal]

But in fact, it is not possible to pull percdev.cv in this way, because if the lambda sequence used by cv.glmnet has less than 100 elements, the lengths of cvfit$glmnet.fit$dev.ratio and cvfit$cvm==mse.min.cereal don't match. So I'm not quite sure how to pull the minimum-lambda dev.ratio from cvfit$glmnet.fit.

So I guess I'm wondering which process is best, why, and how people normally pull the appropriate dev.ratio statistic. Thanks!

like image 991
Leah Bevis Avatar asked Nov 30 '17 21:11

Leah Bevis


People also ask

What is the difference between Glmnet and CV Glmnet?

cv. glmnet uses cross validation whereas glmnet simply relies on the cost function.

What is Alpha in CV Glmnet?

glmnet .) alpha is for the elastic net mixing parameter α, with range α∈[0,1]. α=1 is lasso regression (default) and α=0 is ridge regression.

What is CVM in Glmnet?

glmnet" is returned, which is a list with the ingredients of the cross-validation fit. lambda the values of lambda used in the fits. cvm The mean cross-validated error - a vector of length length(lambda) . cvsd estimate of standard error of svm .


1 Answers

As pointed out in the comments, it has to do with the lambda sequence provided, because if you look under the source code of cv.glmnet, it calls glmnet:::cv.glmnet.raw which in the first few lines, run glmnet() on the defined lambda values.

So we can use an example below:

library(mlbench)
data(BostonHousing)
data = BostonHousing
data$chas=as.numeric(data$chas)

cvfit = cv.glmnet(x=as.matrix(data[,-14]),y=data[,14])
coef.cv <- coef(cvfit, s = "lambda.min")

fit = glmnet(x=as.matrix(data[,-14]), y=data[,14], alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")

head(cbind(coef.cv,coef))
6 x 2 sparse Matrix of class "dgCMatrix"
                       1            1
(Intercept)  31.74123706  31.86654225
crim         -0.09834634  -0.09869320
zn            0.04144161   0.04158829
indus         .            .         
chas          2.68518774   2.68163334
nox         -16.30664523 -16.35459059

They are slightly different, and if you provide the lambda sequence used in cv.glmnet:

fit = glmnet(x=as.matrix(data[,-14]), y=data[,14], alpha=1, lambda=cvfit$lambda)
coef <- coef(fit, s = cvfit$lambda.min)
head(cbind(coef.cv,coef))

6 x 2 sparse Matrix of class "dgCMatrix"
                       1            1
(Intercept)  31.74123706  31.74123706
crim         -0.09834634  -0.09834634
zn            0.04144161   0.04144161
indus         .            .         
chas          2.68518774   2.68518774
nox         -16.30664523 -16.30664523

They are the same now. And the dev.ratio will match too:

fit$dev.ratio[fit$lambda==cvfit$lambda.min]
[1] 0.7401482
cvfit$glmnet.fit$dev.ratio[which.min(cvfit$cvm)]
[1] 0.7401482
like image 85
StupidWolf Avatar answered Oct 05 '22 01:10

StupidWolf