Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I select the smoothing parameter for smooth.spline()?

I know that the smoothing parameter(lambda) is quite important for fitting a smoothing spline, but I did not see any post here regarding how to select a reasonable lambda (spar=?), I was told that spar normally ranges from 0 to 1. Could anyone share your experience when use smooth.spline()? Thanks.

    smooth.spline(x, y = NULL, w = NULL, df, spar = NULL,
          cv = FALSE, all.knots = FALSE, nknots = NULL,
          keep.data = TRUE, df.offset = 0, penalty = 1,
          control.spar = list(), tol = 1e-6 * IQR(x))
like image 924
user001 Avatar asked Feb 18 '13 03:02

user001


People also ask

How do you choose a smoothing parameter?

When choosing smoothing parameters in exponential smoothing, the choice can be made by either minimizing the sum of squared one-step-ahead forecast errors or minimizing the sum of the absolute one- step-ahead forecast errors. In this article, the resulting forecast accuracy is used to compare these two options.

How can I make my spline smoother?

You can change the level of smoothing by specifying Smoothing Parameter as a nonnegative scalar in the range [0 1]. Specify Smoothing Parameter as 0 to create a linear polynomial fit. Specify Smoothing Parameter as 1 to create a piecewise cubic polynomial fit that passes through all the data points.

What is a smoothing spline used for?

Smoothing splines are a powerful approach for estimating functional relationships between a predictor X and a response Y. Smoothing splines can be fit using either the smooth.

What is Spar in smooth spline?

The smooth. spline function in R performs these operations. The degree of smoothness is controlled by an argument called spar=, which usually ranges between 0 and 1. To illustrate, consider a data set consisting of the wheat production of the United States from 1910 to 2004.


2 Answers

agstudy provides a visual way to choose spar. I remember what I learned from linear model class (but not exact) is to use cross validation to pick "best" spar. Here's a toy example borrowed from agstudy:

x = seq(1:18)
y = c(1:3,5,4,7:3,2*(2:5),rep(10,4))
splineres <- function(spar){
  res <- rep(0, length(x))
  for (i in 1:length(x)){
    mod <- smooth.spline(x[-i], y[-i], spar = spar)
    res[i] <- predict(mod, x[i])$y - y[i]
  }
  return(sum(res^2))
}

spars <- seq(0, 1.5, by = 0.001)
ss <- rep(0, length(spars))
for (i in 1:length(spars)){
  ss[i] <- splineres(spars[i])
}
plot(spars, ss, 'l', xlab = 'spar', ylab = 'Cross Validation Residual Sum of Squares' , main = 'CV RSS vs Spar')
spars[which.min(ss)]
R > spars[which.min(ss)]
[1] 0.381

enter image description here

Code is not neatest, but easy for you to understand. Also, if you specify cv=T in smooth.spline:

R > xyspline <- smooth.spline(x, y, cv=T)
R > xyspline$spar
[1] 0.3881
like image 54
liuminzhao Avatar answered Oct 31 '22 13:10

liuminzhao


From the help of smooth.spline you have the following:

The computational λ used (as a function of \code{spar}) is λ = r * 256^(3*spar - 1)

spar can be greater than 1 (but I guess no too much). I think you can vary this parameters and choose it graphically by plotting the fitted values for different spars. For example:

spars <- seq(0.2,2,length.out=10)          ## I will choose between 10 values 
dat <- data.frame(
  spar= as.factor(rep(spars,each=18)),    ## spar to group data(to get different colors)
  x = seq(1:18),                          ## recycling here to repeat x and y 
  y = c(1:3,5,4,7:3,2*(2:5),rep(10,4)))
xyplot(y~x|spar,data =dat, type=c('p'), pch=19,groups=spar,
       panel =function(x,y,groups,...)
       {
          s2  <- smooth.spline(y,spar=spars[panel.number()])
          panel.lines(s2)
          panel.xyplot(x,y,groups,...)
       })

Here for example , I get best results for spars = 0.4

enter image description here

like image 33
agstudy Avatar answered Oct 31 '22 14:10

agstudy