Is there a way of automating variable selection of a GAM in R, similar to step? I've read the documentation of step.gam
and selection.gam
, but I've yet to see a answer with code that works. Additionally, I've tried method= "REML"
and select = TRUE
, but neither remove insignificant variables from the model.
I've theorized that I could create a step model and then use those variables to create the GAM, but that does not seem computationally efficient.
Example:
library(mgcv)
set.seed(0)
dat <- data.frame(rsp = rnorm(100, 0, 1),
pred1 = rnorm(100, 10, 1),
pred2 = rnorm(100, 0, 1),
pred3 = rnorm(100, 0, 1),
pred4 = rnorm(100, 0, 1))
model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4),
data = dat, method = "REML", select = TRUE)
summary(model)
#Family: gaussian
#Link function: identity
#Formula:
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4)
#Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02267 0.08426 0.269 0.788
#Approximate significance of smooth terms:
# edf Ref.df F p-value
#s(pred1) 0.8770 9 0.212 0.1174
#s(pred2) 1.8613 9 0.638 0.0374 *
#s(pred3) 0.5439 9 0.133 0.1406
#s(pred4) 0.4504 9 0.091 0.1775
---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R-sq.(adj) = 0.0887 Deviance explained = 12.3%
#-REML = 129.06 Scale est. = 0.70996 n = 100
mgcv-package: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness... mgcv provides functions for generalized additive modelling ( gam and bam) and generalized additive mixed modelling ( gamm, and random.effects ).
This is why you can estimate a linear effect in a GAM fitted via mgcv but you can't get rid of the linear part because it is totally unaffected by the penalty as it has no wiggliness.
This can be activated in mgcv::gam () by using the select = TRUE argument/setting, or any of the following variations:
This option is activated in mgcv via the select = TRUE argument to gam (); and which means it is turned on for all smooths in the model formula. Marra and Wood's (2011) results suggested that the double penalty approach worked slightly better than the shrinkage smother approach.
Marra and Wood (2011, Computational Statistics and Data Analysis 55; 2372-2387) compare various approaches for feature selection in GAMs. They concluded that an additional penalty term in the smoothness selection procedure gave the best results. This can be activated in mgcv::gam() by using the select = TRUE
argument/setting, or any of the following variations:
model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"),
data = dat, method = "REML",select=T)
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"),
data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With