Difference between glmnet() and cv.glmnet() in R?

Tags:

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code:

# de <- data imported from sql connection        
x <- model.matrix(~.,data = de[,2:7])
y <- (de[,1])
reg <- cv.glmnet(x,y, family = "poisson", alpha = 1)
reg1 <- glmnet(x,y, family = "poisson", alpha = 1)

**Co <- coef(?reg or reg1?,s=???)**

summ <- summary(Co)
c <- data.frame(Name= rownames(Co)[summ$i],
       Lambda= summ$x)
c2 <- c[with(c, order(-Lambda)), ]

The beginning imports a large amount of data from my database in SQL. I then put it in matrix format and separate the response from the predictors.

This is where I'm confused: I can't figure out exactly what the difference is between the glmnet() function and the cv.glmnet() function. I realize that the cv.glmnet() function is a k-fold cross-validation of glmnet(), but what exactly does that mean in practical terms? They provide the same value for lambda, but I want to make sure I'm not missing something important about the difference between the two.

I'm also unclear as to why it runs fine when I specify alpha=1 (supposedly the default), but not if I leave it out?

Thanks in advance!

935

asked Mar 27 '15 22:03

Sean Branchaw

2 Answers

glmnet() is a R package which can be used to fit Regression models,lasso model and others. Alpha argument determines what type of model is fit. When alpha=0, Ridge Model is fit and if alpha=1, a lasso model is fit.

cv.glmnet() performs cross-validation, by default 10-fold which can be adjusted using nfolds. A 10-fold CV will randomly divide your observations into 10 non-overlapping groups/folds of approx equal size. The first fold will be used for validation set and the model is fit on 9 folds. Bias Variance advantages is usually the motivation behind using such model validation methods. In the case of lasso and ridge models, CV helps choose the value of the tuning parameter lambda.

In your example, you can do plot(reg) OR reg$lambda.min to see the value of lambda which results in the smallest CV error. You can then derive the Test MSE for that value of lambda. By default, glmnet() will perform Ridge or Lasso regression for an automatically selected range of lambda which may not give the lowest test MSE. Hope this helps!

Hope this helps!

answered Sep 23 '22 17:09

Amrita Sawant

Between reg$lambda.min and reg$lambda.1se ; the lambda.min obviously will give you the lowest MSE, however, depending on how flexible you can be with the error, you may want to choose reg$lambda.1se, as this value would further shrink the number of predictors. You may also choose the mean of reg$lambda.min and reg$lambda.1se as your lambda value.

answered Sep 21 '22 17:09

OSK

Related questions
                            
                                Incorporating interactive shiny apps into Rmarkdown document for blogdown Hugo blog
                            
                                no visible global function definition for ':='
                            
                                R - how do I declare a vector of Date?
                            
                                What 1-2 letter object names conflict with existing R objects?
                            
                                Sequence length encoding using R
                            
                                debugging a function in R that was not exported by a package
                            
                                Order Stacked Bar Graph in ggplot [duplicate]
                            
                                Modifying the shape for a subset of points with ggplot2
                            
                                Predicted values for logistic regression from glm and stat_smooth in ggplot2 are different
                            
                                handling special characters e.g. accents in R
                            
                                R: unexpected results from p.adjust (FDR)
                            
                                tryCatch does not catch an error if called though RScript
                            
                                Why does `a ^ b` return a numeric when both `a` and `b` are integers?
                            
                                R error which says "Models were not all fitted to the same size of dataset"
                            
                                Rscript could not find function
                            
                                Cross validation for glm() models
                            
                                How to remove coordinate in pie-chart generated by ggplot2
                            
                                Linear model and dplyr - a better solution?
                            
                                ! grep in R - finding items that do not match [duplicate]
                            
                                Error: withCallingHandlers crashing R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between glmnet() and cv.glmnet() in R?

Tags:

r

classification

glm

cross-validation

glmnet

Sean Branchaw

People also ask

2 Answers

Amrita Sawant

OSK

Recent Activity

Donate For Us