Using a penalized spline of mgcv, I want to obtain effective degrees of freedom (EDF) of 10 /year in the example data (60 for the entire period).
library(mgcv)
library(dlnm)
df <- chicagoNMMAPS
df1<-subset(df, as.Date(date) >= '1995-01-01')
mod1 <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow)
,family=quasipoisson,na.action=na.omit,data=df1)
In the example data the basis dimension for time as measured by edf for time is 56.117, which is less than 10 per year.
summary(mod1)
Approximate significance of smooth terms:
edf Ref.df F p-value
s(time) 56.117 67.187 5.369 <2e-16 ***
s(temp) 2.564 3.204 0.998 0.393
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.277 Deviance explained = 28.2%
GCV score = 1.1297 Scale est. = 1.0959 n = 2192
Manually I will change the edf a by supplying smoothing parameters as follows
mod1$sp
s(time) s(temp)
23.84809 17.23785
Then I will plug the sp output into a new model and rerun it. Basically I will continue to alter the sp until I obtain edf of around 60. I will alter only the smoothing parameter for time.
I will start with a lower value and check the edf:
mod1a <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow)
,family=quasipoisson,na.action=na.omit,data=df1, sp= c(12.84809, 17.23785
))
summary(mod1a)
# edf 62.997
I have to increase the smoothing parameters for time to bring down the edf to around 60.
mod1b <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow)
,family=quasipoisson,na.action=na.omit,data=df1, sp= c(14.84809, 17.23785
))
summary(mod1b)
edf 61.393 ## EDF still large, thus I have to increase the sp`
mod1c <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow)
,family=quasipoisson,na.action=na.omit,data=df1, sp=c(16.8190989, 17.23785))
summary(mod1c)
edf= 60.005 ## This is what I want to obtain as a final model.
How can one achieve this final result with an efficient code?
I don't understand the details of your model, but if you are looking to minimize (or maximize) edf
for models fitted with different sp
, optim
will do the job. First, create a function that returns just the edf
given different values of sp
.
edf.by.sp<-function(sp) {
model <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') +
as.factor(dow),
family=quasipoisson,
na.action=na.omit,
data=df1,
sp= c(sp, 17.23785) # Not sure if this quite right.
)
abs(summary(model)$s.table['s(time)','edf']-60) # Subtract 60 and flip sign so 60 is lowest.
}
Now, you can just run optim
to minimize edf
:
# You could pick any reasonable starting sp value.
# Many optimization methods are available, but in your case
# they work equally well.
best<-optim(12,edf.by.sp,method='BFGS')$par
best
# 16.82708
and, subbing back in, you get nearly 0 (exactly 60 before transforming) when plugging in the function:
edf.by.sp(best) # 2.229869e-06
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With