Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting predictions from a GAM model with splines and lagged predictors

Tags:

r

I have some data and am trying to teach myself about utilize lagged predictors within regression models. I'm currently trying to generate predictions from a generalized additive model that uses splines to smooth the data and contains lags.

Let's say I have the following data and have split the data into training and test samples.

head(mtcars)
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)

Great, let's train the gam model on the training set.

f_gam <- gam(hp ~ s(qsec, bs="cr") + s(lag(disp, 1), bs="cr"), data=mtcars[Train,])

summary(f_gam)

When I go to predict on the holdout sample, I get an error message.

f_gam.pred <- predict(f_gam, mtcars[-Train,]); f_gam.pred

Error in ExtractData(object, data, NULL) : 
  'names' attribute [1] must be the same length as the vector [0]
Calls: predict ... predict.gam -> PredictMat -> Predict.matrix3 -> ExtractData

Can anyone help diagnose the issue and help with a solution. I get that lag(__,1) leaves a data point as NA and that is likely the reason for the lengths being different. However, I don't have a solution to the problem.

like image 332
ATMA Avatar asked Jan 23 '26 04:01

ATMA


1 Answers

I'm going to assume you're using gam() from the mgcv library. It appears that gam() doesn't like functions that are not defined in "base" in the s() terms. You can get around this by adding a column which include the transformed variable and then modeling using that variable. For example

tmtcars <- transform(mtcars, ldisp=lag(disp,1))
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(ldisp, bs="cr"), data= tmtcars[Train,])
summary(f_gam)
predict(f_gam, tmtcars[-Train,])

works without error.

The problem appears to be coming from the mgcv:::get.var function. It tires to decode the terms with something like

eval(parse(text = txt), data, enclos = NULL)

and because they explicitly set the enclosure to NULL, variable and function names outside of base cannot be resolved. So because mean() is in the base package, this works

eval(parse(text="mean(x)"), data.frame(x=1:4), enclos=NULL)
# [1] 2.5

but because var() is defined in stats, this does not

eval(parse(text="var(x)"), data.frame(x=1:4), enclos=NULL)
# Error in eval(expr, envir, enclos) : could not find function "var"

and lag(), like var() is defined in the stats package.

like image 150
MrFlick Avatar answered Jan 25 '26 19:01

MrFlick