Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

caret: Error when using anything but LOOCV with rpart

Tags:

r

I'm trying to use the R caret module for model generation and I want to use some cross-validation function. I found out that the only cross validation function which works together with rpart is LOOCV (leave one out cross validation).

The following code throws the error:

library(cart)
data(trees)
formula=Volume~Girth+Height
train(formula, data=trees,  method='rpart')

Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures.

What does this error mean and how do I make it go away? I searched on the internet, not a single hit for this error-message. I traced the error down to the rpart model generation. It somehow outputs this error message, all other mode-generation-methods work fine!

Everything works fine if I use LOOCV.

I traced the warning down to the workflows.R file, but I do not understand why this warning gets thrown.

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] earth_3.2-3           plotrix_3.4           plotmo_1.3-1         
 [4] leaps_2.9             doMC_1.2.5            multicore_0.1-7      
 [7] iterators_1.0.6       forecast_3.20         RcppArmadillo_0.3.0.2
[10] Rcpp_0.9.10           fracdiff_1.4-1        tseries_0.10-28      
[13] zoo_1.7-7             quadprog_1.5-4        caret_5.15-023       
[16] foreach_1.4.0         cluster_1.14.2        reshape_0.8.4        
[19] plyr_1.7.1            lattice_0.20-6        mda_0.4-2            
[22] class_7.3-3           rpart_3.1-52          data.table_1.8.0     

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0    
like image 594
theomega Avatar asked May 08 '12 17:05

theomega


1 Answers

Using the R Mailinglist and the help of the caret author I found out the following solution:

If for some reason the model which is generated is constant the error occurs. Constant means in this case that for all input values the model always yields the same value. In this case, the calculation of R^2 fails. R^2 is calculated per default by caret. As caret does not use the R^2 value for model selection, you can skip this error.

Two questions remain:

  • It is not clear to me why the R^2 calculation fails if the model is constant. The code in caret explicitely fails if there are not at least two different values in the model prediction. I replaced the R^2 calculation with a selfwritten one which does not have these limits.
  • The question why rpart sometimes generates a constant model is still open. Especially why it only generates constant models for other cross validations than LOOCV.

In short: You can ignore the warning and if you need, write your own R^2 to fix the warning.

like image 54
theomega Avatar answered Oct 08 '22 04:10

theomega