Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variable Selection with mgcv

Tags:

r

mgcv

gam

Is there a way of automating variable selection of a GAM in R, similar to step? I've read the documentation of step.gam and selection.gam, but I've yet to see a answer with code that works. Additionally, I've tried method= "REML" and select = TRUE, but neither remove insignificant variables from the model.

I've theorized that I could create a step model and then use those variables to create the GAM, but that does not seem computationally efficient.

Example:

library(mgcv)

set.seed(0)
dat <- data.frame(rsp = rnorm(100, 0, 1), 
                  pred1 = rnorm(100, 10, 1), 
                  pred2 = rnorm(100, 0, 1), 
                  pred3 = rnorm(100, 0, 1), 
                  pred4 = rnorm(100, 0, 1))

model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4),
             data = dat, method = "REML", select = TRUE)

summary(model)

#Family: gaussian 
#Link function: identity 

#Formula:
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4)

#Parametric coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)  0.02267    0.08426   0.269    0.788

#Approximate significance of smooth terms:
#            edf Ref.df     F p-value  
#s(pred1) 0.8770      9 0.212  0.1174  
#s(pred2) 1.8613      9 0.638  0.0374 *
#s(pred3) 0.5439      9 0.133  0.1406  
#s(pred4) 0.4504      9 0.091  0.1775  
---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#R-sq.(adj) =  0.0887   Deviance explained = 12.3%
#-REML = 129.06  Scale est. = 0.70996   n = 100
like image 835
IJH Avatar asked Jul 25 '16 14:07

IJH


People also ask

What is mgcv-package?

mgcv-package: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness... mgcv provides functions for generalized additive modelling ( gam and bam) and generalized additive mixed modelling ( gamm, and random.effects ).

Why can't I get rid of the linear part in mgcv?

This is why you can estimate a linear effect in a GAM fitted via mgcv but you can't get rid of the linear part because it is totally unaffected by the penalty as it has no wiggliness.

How do I activate the gam() function in mgcv?

This can be activated in mgcv::gam () by using the select = TRUE argument/setting, or any of the following variations:

How do I activate the double penalty option in mgcv?

This option is activated in mgcv via the select = TRUE argument to gam (); and which means it is turned on for all smooths in the model formula. Marra and Wood's (2011) results suggested that the double penalty approach worked slightly better than the shrinkage smother approach.


1 Answers

Marra and Wood (2011, Computational Statistics and Data Analysis 55; 2372-2387) compare various approaches for feature selection in GAMs. They concluded that an additional penalty term in the smoothness selection procedure gave the best results. This can be activated in mgcv::gam() by using the select = TRUE argument/setting, or any of the following variations:

model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"),
             data = dat, method = "REML",select=T)
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"),
             data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")
like image 54
Hack-R Avatar answered Oct 15 '22 06:10

Hack-R