Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

error with train from caret package using method gam:

Tags:

r

r-caret

gam

I have a gam model that I know works just fine in R, but when I try to "train" the same model using the caret package it returns an error saying that the input data columns are lists. Does anyone understand this?

The code that I am running is as follows:

library("caret")
library("mgcv")

a <- gam(RW ~ s(Temp0.grd) + s(mld.grd) + s(mean_depth.grd) +
           s(land_dist.grd) + s(slope.grd) + s(npp.grd),
         data=mydata,
         family=binomial)

all.data.gam.train <- 
  train(form=RW ~ s(Temp0.grd) + s(mld.grd) + s(mean_depth.grd) +
          s(land_dist.grd) + s(slope.grd) + s(npp.grd),
        data=mydata,
        method='gam',
        family=binomial
  )

The first gam model works fine, but train returns the following error:

    Error in model.frame.default(form = RW ~ s(Temp0.grd) + s(mld.grd) + s(mean_depth.grd) +  : 
  invalid type (list) for variable 's(Temp0.grd)'

Running model.frame.default directly on the formula also produces this error, so the problem isn't strictly speaking with train.

mydata looks as follows:

> class(mydata)
[1] "data.frame"
> class(mydata$Temp0.grd)
[1] "numeric"
> class(s(mydata$Temp0.grd))
[1] "tp.smooth.spec"
> head(mydata)
    RW land_dist.grd mean_depth.grd  mld.grd   npp.grd primprod.grd Sal0.grd salbottom.grd
372  1           172      -79.83889 14.70062 1124.6136          920 31.27995         32.70
373  0           157      -84.53555 14.70062  973.1954          889 31.27995         32.70
374  1           146      -91.53111 14.70062  896.5736          803 31.38220         32.59
375  1           137      -89.44222 14.70062  783.4132          719 31.38220         32.59
405  1           173     -100.87666 14.70062 1010.4898          755 31.27995         32.70
406  1           197     -104.24111 14.70062  816.1457          767 31.27995         32.70
    salsurf.grd seamounts_dist.grd slope.grd sst.grd Temp0.grd Temp100.grd Temp50.grd
372       30.36           1529.184 16.068041    1.77  6.532125  0.31340000    0.36470
373       30.36           1513.419 16.317524    1.77  6.532125  0.31340000    0.36470
374       30.68           1496.227  8.578011    1.68  6.466700  0.01937502   -0.04645
375       30.68           1479.382  8.134535    1.68  6.466700  0.01937502   -0.04645
405       30.36           1483.972 18.345858    1.77  6.532125  0.31340000    0.36470
406       30.36           1474.469 13.433269    1.77  6.532125  0.31340000    0.36470
    tempbottom.grd
372           1.58
373           1.58
374           1.23
375           1.23
405           1.58
406           1.58

For info, my R installation is as follows:

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mgcv_1.7-27     nlme_3.1-111    caret_5.16-04   reshape2_1.2.2  plyr_1.8       
[6] lattice_0.20-24 foreach_1.4.0   cluster_1.14.4 

loaded via a namespace (and not attached):
[1] codetools_0.2-8 grid_3.0.2      iterators_1.0.6 Matrix_1.1-0    stringr_0.6.2  
[6] tools_3.0.2    

Thanks for the assistance!

like image 767
user3004015 Avatar asked Nov 18 '13 09:11

user3004015


1 Answers

When you use train with this model, you cannot (at this time) specify the gam formula. caret has an internal function that figures out a formula based on how many unique levels each predictor has etc. In other words, train currently determines which terms are smoothed and which are plain old linear main effects.

Try using the same code without the smooth term indicates in the train formula and see if it results in an error.

The next version of caret (probably around the start of the year) will give you much more flexibility to create your own formula with GAMs and other models.

Max

like image 182
topepo Avatar answered Sep 28 '22 18:09

topepo