Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mgcv: How to set number and / or locations of knots for splines

I want to use function gam in mgcv packages:

 x <- seq(0,60, len =600)
 y <- seq(0,1, len=600) 
 prova <- gam(y ~ s(x, bs='cr')

can I set the number of knots in s()? and then can I know where are the knots that the spline used? Thanks!

like image 697
memy Avatar asked Oct 15 '16 07:10

memy


People also ask

How do you decide how many knots to buy?

To choose the number of knots k you should use a value larger than the number of degrees of freedom you are expecting.

What are knots in splines?

A B-spline is a piecewise polynomial, and its knots are the points where the pieces meet. A knot would have the same type as the argument to the polynomials. Generally you would also supply a value at each knot, and either a control point between each consecutive pair or a first derivative.

How many knots are in a cubic spline?

Five knots are sufficient to capture many non-linear patterns, it may not be wise to include 5 knots for each continuous predictor in a multivariable model. Too much flexibility would lead to overfitting.

What is MGCV?

mgcv is an R package for estimating penalized Generalized Linear models including Generalized Additive Models and Generalized Additive Mixed Models. mgcv includes an implementation of 'gam', based on penalized regression splines with automatic smoothness estimation.


1 Answers

While setting k is the correct way to go, fx = TRUE is definitely not right: it will force using pure regression spline without penalization.


locations of knots

For penalized regression spline, the exact locations are not important, as long as:

  • k is adequately big;
  • the spread of knots has good, reasonable coverage.

By default:

  • natural cubic regression spline bs = 'cr' places knots by quantile;
  • B-splines family (bs = 'bs', bs = 'ps', bs = 'ad') place knots evenly.

Compare the following:

library(mgcv)

## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi))  ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e

## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp  ## extract knots locations

## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots  ## extract knots locations

## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)

enter image description here

You can see the difference in knots placement.


Setting your own knots locations:

You can also provide your customized knots locations via the knots argument of gam() (yes, knots are not fed to s(), but to gam()). For example, you can do evenly spaced knots for cr:

xlim <- range(x)  ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k =20),
         knots = list(x = seq(xlim[1], xlim[2], length = 20)))

Now you can see that:

my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)

enter image description here

However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. Also, the number of knots you provided must match k in the s().

like image 186
Zheyuan Li Avatar answered Sep 19 '22 12:09

Zheyuan Li