Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using "mgcv::s" in "gam(y ~ mgcv::s...)" result in an error?

Tags:

syntax

r

mgcv

I wanted to be clear and use the :: notation in the lines for fitting an mgcv::gam. I stumbled over one thing when using the notation within the model call for mgcv::s. The code with a reproducible example / error is shown below.

The reason is probably because I am using this notation within the model formula, but I could not figure out why this does not work / is not allowed. This is probably something quite specific concerning syntax (probably not mgcv specific, I guess), but maybe somebody can help me in understanding this and my understanding of R. Thank you in advance.

library(mgcv)
dat <- data.frame(x = 1:10, y = 101:110)
# this results in an error: invalid type (list)...
mgcv::gam(y ~ mgcv::s(x, bs = "cs", k = -1), data = dat)
# after removing the mgcv:: in front of s everything works fine
mgcv::gam(y ~ s(x, bs = "cs", k = -1), data = dat)

# outside of the model call, both calls return the desired function
class(s)
# [1] "function"
class(mgcv::s)
# [1] "function"
like image 513
Manuel Bickel Avatar asked Jun 26 '18 21:06

Manuel Bickel


2 Answers

Explanation

library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.

f1 <- ~ s(x, bs = 'cr', k = -1)
f2 <- ~ mgcv::s(x, bs = 'cr', k = -1)

OK <- mgcv:::interpret.gam0(f1)$smooth.spec
FAIL <- mgcv:::interpret.gam0(f2)$smooth.spec

str(OK)
# $ :List of 10
#  ..$ term   : chr "x"
#  ..$ bs.dim : num -1
#  ..$ fixed  : logi FALSE
#  ..$ dim    : int 1
#  ..$ p.order: logi NA
#  ..$ by     : chr "NA"
#  ..$ label  : chr "s(x)"
#  ..$ xt     : NULL
#  ..$ id     : NULL
#  ..$ sp     : NULL
#  ..- attr(*, "class")= chr "cr.smooth.spec"

str(FAIL)
# list()

The 4th line of the source code of interpret.gam0 reveals the issue:

head(mgcv:::interpret.gam0)

1 function (gf, textra = NULL, extra.special = NULL)              
2 {                                                               
3     p.env <- environment(gf)                                    
4     tf <- terms.formula(gf, specials = c("s", "te", "ti", "t2", 
5         extra.special))                                         
6     terms <- attr(tf, "term.labels") 

Since "mgcv::s" is not to be matched, you get the problem. But mgcv does allow you the room to work around this, by passing "mgcv::s" via argument extra.special:

FIX <- mgcv:::interpret.gam0(f, extra.special = "mgcv::s")$smooth.spec
all.equal(FIX, OK)
# [1] TRUE

It is just that this is not user-controllable at high-level routine:

head(mgcv::gam, n = 10)

#1  function (formula, family = gaussian(), data = list(), weights = NULL, 
#2      subset = NULL, na.action, offset = NULL, method = "GCV.Cp",        
#3      optimizer = c("outer", "newton"), control = list(), scale = 0,     
#4      select = FALSE, knots = NULL, sp = NULL, min.sp = NULL, H = NULL,  
#5      gamma = 1, fit = TRUE, paraPen = NULL, G = NULL, in.out = NULL,    
#6      drop.unused.levels = TRUE, drop.intercept = NULL, ...)             
#7  {                                                                      
#8      control <- do.call("gam.control", control)                         
#9      if (is.null(G)) {                                                  
#10         gp <- interpret.gam(formula)  ## <- default to extra.special = NULL

I agree with Ben Bolker. It is a good exercise to dig out what happens inside, but is an over-reaction to consider this as a bug and fix it.


More insight:

s, te, etc. in mgcv does not work in the same logic with stats::poly and splines::bs.

  • When you do for example, X <- splines::bs(x, df = 10, degree = 3), it evaluates x and create a design matrix X directly.
  • When you do s(x, bs = 'cr', k = 10), no evaluation is made; it is parsed.

Smooth construction in mgcv takes several stages:

  1. parsing / interpretation by mgcv::interpret.gam, which generates a profile for a smoother;
  2. initial construction by mgcv::smooth.construct, which sets up basis / design matrix and penalty matrix (mostly done at C-level);
  3. secondary construction by mgcv::smoothCon, which picks up "by" variable (duplicating smooth for factor "by", for example), linear functional terms, null space penalty (if you use select = TRUE), penalty rescaling, centering constraint, etc;
  4. final integration by mgcv:::gam.setup, which combines all smoothers together, returning a model matrix, etc.

So, it is a far more complicated process.

like image 130
Zheyuan Li Avatar answered Nov 20 '22 05:11

Zheyuan Li


This looks like an mgcv issue. For example, the lm() function accepts poly() or stats::poly() and gives the same results (other than the names of things):

> x <- 1:100
> y <- rnorm(100)
> lm(y ~ poly(x, 3))

Call:
lm(formula = y ~ poly(x, 3))

Coefficients:
(Intercept)  poly(x, 3)1  poly(x, 3)2  poly(x, 3)3  
    0.07074      0.13631     -1.52845     -0.93285  

> lm(y ~ stats::poly(x, 3))

Call:
lm(formula = y ~ stats::poly(x, 3))

Coefficients:
       (Intercept)  stats::poly(x, 3)1  stats::poly(x, 3)2  stats::poly(x, 3)3  
           0.07074             0.13631            -1.52845            -0.93285  

It also works with the splines::bs function, so this isn't specific to poly().

You should contact the mgcv maintainer and point out this bug in that package. I'd guess it is looking specifically for s, rather than for an expression like mgcv::s that evaluates to the same thing.

like image 2
user2554330 Avatar answered Nov 20 '22 06:11

user2554330