I want to use function gam
in mgcv
packages:
x <- seq(0,60, len =600)
y <- seq(0,1, len=600)
prova <- gam(y ~ s(x, bs='cr')
can I set the number of knots in s()
? and then can I know where are the knots that the spline used? Thanks!
To choose the number of knots k you should use a value larger than the number of degrees of freedom you are expecting.
A B-spline is a piecewise polynomial, and its knots are the points where the pieces meet. A knot would have the same type as the argument to the polynomials. Generally you would also supply a value at each knot, and either a control point between each consecutive pair or a first derivative.
Five knots are sufficient to capture many non-linear patterns, it may not be wise to include 5 knots for each continuous predictor in a multivariable model. Too much flexibility would lead to overfitting.
mgcv is an R package for estimating penalized Generalized Linear models including Generalized Additive Models and Generalized Additive Mixed Models. mgcv includes an implementation of 'gam', based on penalized regression splines with automatic smoothness estimation.
While setting k
is the correct way to go, fx = TRUE
is definitely not right: it will force using pure regression spline without penalization.
locations of knots
For penalized regression spline, the exact locations are not important, as long as:
k
is adequately big;By default:
bs = 'cr'
places knots by quantile;bs = 'bs'
, bs = 'ps'
, bs = 'ad'
) place knots evenly.Compare the following:
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)
You can see the difference in knots placement.
Setting your own knots locations:
You can also provide your customized knots locations via the knots
argument of gam()
(yes, knots are not fed to s()
, but to gam()
). For example, you can do evenly spaced knots for cr
:
xlim <- range(x) ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k =20),
knots = list(x = seq(xlim[1], xlim[2], length = 20)))
Now you can see that:
my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)
However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. Also, the number of knots you provided must match k
in the s()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With