I'm using the caret
library in R for model generation. I want to generate an earth
(aka MARS) model and I want to specify the degree
parameter for this model generation. According to the documentation (page 11) the earth
method supports this parameter.
I get the following error message when specifying the parameter:
> library(caret)
> data(trees)
> train(Volume~Girth+Height, data=trees, method='earth', degree=1)
Error in { :
task 1 failed - "formal argument "degree" matched by multiple actual arguments"
How can I avoid this error when specifying the degree
parameter?
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] earth_3.2-3 plotrix_3.4 plotmo_1.3-1 leaps_2.9 caret_5.15-023
[6] foreach_1.4.0 cluster_1.14.2 reshape_0.8.4 plyr_1.7.1 lattice_0.20-6
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6
[5] tools_2.15.0
The train function can generate a candidate set of parameter values and the tuneLength argument controls how many are evaluated. In the case of PLS, the function uses a sequence of integers from 1 to tuneLength . If we want to evaluate all integers between 1 and 15, setting tuneLength = 15 would achieve this.
4 The trainControl Function. The function trainControl generates parameters that further control how models are created, with possible values: method : The resampling method: "boot" , "cv" , "LOOCV" , "LGOCV" , "repeatedcv" , "timeslice" , "none" and "oob" .
# The tuneGrid parameter lets us decide which values the main parameter will take # While tuneLength only limit the number of default parameters to use.
tuneLength = It allows system to tune algorithm automatically. It indicates the number of different values to try for each tunning parameter. For example, mtry for randomForest. Suppose, tuneLength = 5, it means try 5 different mtry values and find the optimal mtry value based on these 5 values.
I have always found the functions in caret both useful and somewhat maddening. Here's what's going on.
You're attempting to pass an argument to earth
via the ...
argument to train
. The documentation for train
contains this description for that argument:
arguments passed to the classification or regression routine (such as randomForest). Errors will occur if values for tuning parameters are passed here.
Tuning parameter, eh? Well, if you scroll down and examine the official list of tuning parameters for each model type, you'll see that for earth
, they are degree
and nprune
.
So the issue here is that train
is designed to automate some grid searching along tuning parameters, and the ...
argument is to be used for passing further arguments to the model fitting function except for those tuning parameters.
If you want to set the tuning parameters you'll need to use other arguments, like so:
train(Volume~Girth+Height, data=trees, method='earth',
tuneGrid = data.frame(.degree = 1,.nprune = 5))
Note how the columns are named with leading periods. Also, it is frustrating that since the default value in earth
for nprune
is NULL
, I'm not sure you can pass only the default values in this way. (Generally, setting things to NULL
in data frames will simply remove them.)
I found out how to do it, joran led me into the right direction:
Create a new function which generates the training grid. This function must accept the two parameters len
and data
. In order to retrieve the original training grid, you can call the createGrid
method provided by the caret
package. You can then modify the grid to your needs. For example to neave the nprune
parameter unchanged and add degree
from 1 to 5 use the following code:
createMARSGrid <- function(len, data) {
g = createGrid("earth", len, data)
g = expand.grid(.nprune=g$.nprune, .degree=seq(1,5))
return(g)
}
Then invoke it like this:
train(formula, data=data, method='earth', tuneGrid = createMARSGrid)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With