I'm trying to use the R caret module for model generation and I want to use some cross-validation function. I found out that the only cross validation function which works together with rpart
is LOOCV
(leave one out cross validation).
The following code throws the error:
library(cart)
data(trees)
formula=Volume~Girth+Height
train(formula, data=trees, method='rpart')
Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures.
What does this error mean and how do I make it go away? I searched on the internet, not a single hit for this error-message. I traced the error down to the rpart
model generation. It somehow outputs this error message, all other mode-generation-methods work fine!
Everything works fine if I use LOOCV
.
I traced the warning down to the workflows.R file, but I do not understand why this warning gets thrown.
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] earth_3.2-3 plotrix_3.4 plotmo_1.3-1
[4] leaps_2.9 doMC_1.2.5 multicore_0.1-7
[7] iterators_1.0.6 forecast_3.20 RcppArmadillo_0.3.0.2
[10] Rcpp_0.9.10 fracdiff_1.4-1 tseries_0.10-28
[13] zoo_1.7-7 quadprog_1.5-4 caret_5.15-023
[16] foreach_1.4.0 cluster_1.14.2 reshape_0.8.4
[19] plyr_1.7.1 lattice_0.20-6 mda_0.4-2
[22] class_7.3-3 rpart_3.1-52 data.table_1.8.0
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0
Using the R Mailinglist and the help of the caret
author I found out the following solution:
If for some reason the model which is generated is constant the error occurs. Constant means in this case that for all input values the model always yields the same value. In this case, the calculation of R^2 fails. R^2 is calculated per default by caret. As caret does not use the R^2 value for model selection, you can skip this error.
Two questions remain:
caret
explicitely fails if there are not at least two different values in the model prediction. I replaced the R^2 calculation with a selfwritten one which does not have these limits.In short: You can ignore the warning and if you need, write your own R^2 to fix the warning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With