Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve/reproduce design matrix from smooth.spline in R

Tags:

r

spline

I would like to know how to retrieve or reproduce the design matrix used under the hood by smooth.spline. It does not appear that smooth.spline has a way to return this matrix. As such, I am trying to reproduce the matrix by using the bs function.

When I run smooth.spline and specify the knots via the all.knots argument, the number of parameters from the fit is off by one compared to the bspline matrix I get by passing the same knots to the bs function. Can someone help me understand how to reproduce the training matrix that's being used under the hood by smooth.spline given a known set of knots? Thanks!

Here is code that reproduces the behavior:

n = 200
y = rnorm(n=n)
x = runif(n=n)
knots = c(0,seq(0.001,0.999,.01),1)
mod = smooth.spline(x,y,all.knots=knots,keep.stuff=TRUE)

X = bs(x,knots=knots)

dim(X)[2] #105 
length(mod$fit$coef) #104
like image 404
curious_dan Avatar asked Nov 29 '25 17:11

curious_dan


1 Answers

This doesn't explain how to use the internal data of the output of smooth.spline to get the smoother matrix but we can get that matrix by pumping an identity matrix through smooth.spline (which is also a general method of finding the matrix of a linear operator). Also see https://www.stat.cmu.edu/~cshalizi/dst/18/hw/02/smoother.matrix.R

set.seed(123)
n <- 200
y <- rnorm(n)
x <- runif(n)
mod <- smooth.spline(x, y, keep.stuff = TRUE)

S <- apply(diag(n), 2, function(y) fitted(smooth.spline(x, y, df = mod$df)))

max(abs(S %*% y - fitted(mod)))
## [1] 2.183287e-08

If you only need the diagonal of the smoother matrix then hatvalues(mod) will give that.

max(abs(diag(S) - hatvalues(mod)))
## [1] 1.676781e-08

This computes a matrix X which satisfies Xb = Sy.

b <- mod$fit$coef
X <- tcrossprod(fitted(mod), b/c(crossprod(b)))

max(abs(X %*% b - S %*% y))
## [1] 4.959864e-09

The equation is satisfied because if X = (Sy)b'/(b'b) then Xb = ((Sy)b'/(b'b))b = (Sy)(b'b)/(b'b) = Sy .

Note that such an X is not unique since if we add vectors orthogonal to b to rows of X then the equation is still satisfied.

Although the X we showed does satisfy the equation asked for I am not sure if it really is what you want. Note that looking at the source we can convert mod$auxM to a list of matrices like this:

aux2Mat <- function(auxM) {
    stopifnot(is.list(auxM),
              identical(vapply(auxM, class, ""),
                        setNames(rep("numeric", 4), c("XWy", "XWX", "Sigma", "R"))))
    ## requireNamespace("Matrix")# want sparse matrices
    nk <- length(XWy <- auxM[["XWy"]])
    list(XWy = XWy,
         XWX =  Matrix::bandSparse(nk, k= 0:3, diagonals= matrix(auxM[[ "XWX" ]], nk,4), symmetric=TRUE),
         Sigma= Matrix::bandSparse(nk, k= 0:3, diagonals= matrix(auxM[["Sigma"]], nk,4), symmetric=TRUE))
}
lM <- aux2Mat(mod$auxM)
A <- lM$XWX + mod$lambda * lM$Sigma
R <- Matrix::chol(A)

The following two lines give the same result:

bb <- solve(A, lM$XWy)
b <- mod$fit$coef

max(abs(b - bb))
## [1] 1.125863e-10

Also these two are the same

crossprod(b, as.matrix(lM$XWX)) %*% b
##            [,1]
## [1,] 0.01597881
crossprod(X %*% b)
##            [,1]
## [1,] 0.01597881

and these two are the same

y %*% X %*% b
##            [,1]
## [1,] 0.02039534
lM$XWy %*% b
##            [,1]
## [1,] 0.02039534

Update

Corrections.

like image 103
G. Grothendieck Avatar answered Dec 02 '25 05:12

G. Grothendieck