I am running a regression with 67 observasions and 32 variables. I am doing variable selection using cv.glmnet function from the glmnet package. There is one variable I want to force into the model. (It is dropped during normal procedure.) How can I specify this condition in cv.glmnet?
Thank you!
My code looks like the following:
glmntfit <- cv.glmnet(mydata[,-1], mydata[,1])
coef(glmntfit, s=glmntfit$lambda.1se)
And the variable I want is mydata[,2].
glmnet .) alpha is for the elastic net mixing parameter α, with range α∈[0,1]. α=1 is lasso regression (default) and α=0 is ridge regression.
If standardize = F, glmnet doesn't standardize the x , it assumes that is was done prior . Well, guess we both should check the documentation again...
By default glmnet chooses the lambda. 1se . It is the largest λ at which the MSE is within one standard error of the minimal MSE. Along the lines of overfitting, this usually reduces overfitting by selecting a simpler model (less non zero terms) but whose error is still close to the model with the least error.
cv. glmnet() performs cross-validation, by default 10-fold which can be adjusted using nfolds. A 10-fold CV will randomly divide your observations into 10 non-overlapping groups/folds of approx equal size. The first fold will be used for validation set and the model is fit on 9 folds.
This can be achieved by providing a penalty.factor
vector, as described in ?glmnet
. A penalty factor of 0
indicates that the "variable is always included in the model", while 1
is the default.
glmntfit <- cv.glmnet(mydata[,-1], mydata[, 1],
penalty.factor=c(0, rep(1, ncol(mydata) - 2)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With