I have a gam
model that I know works just fine in R
, but when I try to "train
" the same model using the caret
package it returns an error saying that the input data columns are lists. Does anyone understand this?
The code that I am running is as follows:
library("caret")
library("mgcv")
a <- gam(RW ~ s(Temp0.grd) + s(mld.grd) + s(mean_depth.grd) +
s(land_dist.grd) + s(slope.grd) + s(npp.grd),
data=mydata,
family=binomial)
all.data.gam.train <-
train(form=RW ~ s(Temp0.grd) + s(mld.grd) + s(mean_depth.grd) +
s(land_dist.grd) + s(slope.grd) + s(npp.grd),
data=mydata,
method='gam',
family=binomial
)
The first gam model works fine, but train returns the following error:
Error in model.frame.default(form = RW ~ s(Temp0.grd) + s(mld.grd) + s(mean_depth.grd) + :
invalid type (list) for variable 's(Temp0.grd)'
Running model.frame.default directly on the formula also produces this error, so the problem isn't strictly speaking with train.
mydata looks as follows:
> class(mydata)
[1] "data.frame"
> class(mydata$Temp0.grd)
[1] "numeric"
> class(s(mydata$Temp0.grd))
[1] "tp.smooth.spec"
> head(mydata)
RW land_dist.grd mean_depth.grd mld.grd npp.grd primprod.grd Sal0.grd salbottom.grd
372 1 172 -79.83889 14.70062 1124.6136 920 31.27995 32.70
373 0 157 -84.53555 14.70062 973.1954 889 31.27995 32.70
374 1 146 -91.53111 14.70062 896.5736 803 31.38220 32.59
375 1 137 -89.44222 14.70062 783.4132 719 31.38220 32.59
405 1 173 -100.87666 14.70062 1010.4898 755 31.27995 32.70
406 1 197 -104.24111 14.70062 816.1457 767 31.27995 32.70
salsurf.grd seamounts_dist.grd slope.grd sst.grd Temp0.grd Temp100.grd Temp50.grd
372 30.36 1529.184 16.068041 1.77 6.532125 0.31340000 0.36470
373 30.36 1513.419 16.317524 1.77 6.532125 0.31340000 0.36470
374 30.68 1496.227 8.578011 1.68 6.466700 0.01937502 -0.04645
375 30.68 1479.382 8.134535 1.68 6.466700 0.01937502 -0.04645
405 30.36 1483.972 18.345858 1.77 6.532125 0.31340000 0.36470
406 30.36 1474.469 13.433269 1.77 6.532125 0.31340000 0.36470
tempbottom.grd
372 1.58
373 1.58
374 1.23
375 1.23
405 1.58
406 1.58
For info, my R installation is as follows:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.7-27 nlme_3.1-111 caret_5.16-04 reshape2_1.2.2 plyr_1.8
[6] lattice_0.20-24 foreach_1.4.0 cluster_1.14.4
loaded via a namespace (and not attached):
[1] codetools_0.2-8 grid_3.0.2 iterators_1.0.6 Matrix_1.1-0 stringr_0.6.2
[6] tools_3.0.2
Thanks for the assistance!
When you use train
with this model, you cannot (at this time) specify the gam
formula. caret
has an internal function that figures out a formula based on how many unique levels each predictor has etc. In other words, train
currently determines which terms are smoothed and which are plain old linear main effects.
Try using the same code without the smooth term indicates in the train
formula and see if it results in an error.
The next version of caret
(probably around the start of the year) will give you much more flexibility to create your own formula with GAMs and other models.
Max
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With