Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multivariate regression splines in R

Most people are probably familiar with bs from splines:

library(splines)

workingModel <- lm(mpg ~ factor(gear) + bs(wt, knots = 5) + hp, data = mtcars)

bs(mtcars$wt, knots = 4)

This uses a b-spline for the singe variable weight, but you can also do multivariate splines:

bs(cbind(mtcars$wt,mtcars$hp), knots = 4)

But this produces a matrix with twice as many rows as mtcars, so when I try:

brokenModel <- lm(mpg ~ bs(cbind(mtcars$wt,mtcars$hp), knots = 4), data = mtcars)

I get an error about differing lengths.

My question is: how do I use the multivariate spline in a model if it has a different number of rows than my outcome variable? Do I stack my outcome variable on top of itself y <- c(y, y)? Why does the multivariate spline produce extra rows?

Thanks.

like image 637
Carl Avatar asked Jan 12 '17 21:01

Carl


Video Answer


1 Answers

You can't use splines::bs in this case, as it is strictly for construction of a univariate spline. If you do bs(mat) where mat is a matrix, it is just doing bs(c(mat)). For example,

mat <- matrix(runif(8), 4, 2)
identical(bs(mat), bs(c(mat)))
# [1] TRUE

This explains why you get double number of rows, when doing bs(cbind(mtcars$wt,mtcars$hp).


To create a 2D spline, the simplest way is to create additive spline:

lm(mpg ~ factor(gear) + bs(wt, knots = 5) + bs(hp, knots = 4), mtcars)

but this may not be what you want. Then consider interaction:

model <- lm(mpg ~ factor(gear) + bs(wt, knots = 5):bs(hp, knots = 4), mtcars)

The bs(wt, knots = 5):bs(hp, knots = 4) is forming row-wise Kronecker product between the two design matrices. Since bs(wt, knots = 5) is a matrix of 4 columns, and bs(hp, knots = 4) is a matrix of 3 columns, the interaction has 4 * 3 = 12 columns.


Alternatively, consider using mgcv package. In mgcv, multivariate splines can be constructed in two ways:

  • isotropic thin-plate splines;
  • scale invariant tensor product splines.

Clearly you want the second here, as wt and hp have different units. To construct tensor product splines, we can use:

library(mgcv)
fit <- gam(mpg ~ factor(gear)
                 + s(wt, bs = 'cr', k = 4, fx = TRUE)
                 + s(hp, bs = 'cr', k = 4, fx = TRUE)
                 + ti(wt, hp, bs = 'cr', k = c(4, 4), d = c(1, 1), fx = TRUE),
                 data = mtcars)

Here, I specially set fx = TRUE to disable penalized regression.

I don't want to write an extensive answer to introduce mgcv. For how s, ti and gam work, just read documentation. If you need to bridge the gap in theory, grab Simon Wood's book published in 2006: Generalized Additive Models: an introduction with R.


A practical example of mgcv usage?

I had an answer Cubic spline method for longitudinal series data which might help you get familiar with mgcv. But as an introductory example, it only shows how to work with univariate spline. Fortunately, that is also the key. Tensor product spline is constructed from univariate spline.

My other answers related to mgcv is more of theoretical aspect; while not all my answers related to spline is making reference to mgcv. So that question & answer is the best I could give you at this stage.

Would the scale invariant tensor product splines be equivalent to radial smoothing or would that be the isotropic thin-place splines?

Radial smoothing is equivalent to thin-plate spline, as the basis function for a thin-plate spline is radial. That is why it is isotropic and can be used in spatial regression.

Tensor product spline is scale invariant, as it is constructed as (pairwise) multiplication of univariate spline basis.

like image 149
Zheyuan Li Avatar answered Sep 22 '22 09:09

Zheyuan Li