Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

caret::train: specify model-generation-parameters

Tags:

r

r-caret

I'm using the caret library in R for model generation. I want to generate an earth (aka MARS) model and I want to specify the degree parameter for this model generation. According to the documentation (page 11) the earth method supports this parameter.

I get the following error message when specifying the parameter:

> library(caret)
> data(trees)
> train(Volume~Girth+Height, data=trees, method='earth', degree=1)
Error in { : 
  task 1 failed - "formal argument "degree" matched by multiple actual arguments"

How can I avoid this error when specifying the degree parameter?

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] earth_3.2-3    plotrix_3.4    plotmo_1.3-1   leaps_2.9      caret_5.15-023
 [6] foreach_1.4.0  cluster_1.14.2 reshape_0.8.4  plyr_1.7.1     lattice_0.20-6

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0     iterators_1.0.6
[5] tools_2.15.0   
like image 769
theomega Avatar asked May 08 '12 12:05

theomega


People also ask

What does train () do in R?

The train function can generate a candidate set of parameter values and the tuneLength argument controls how many are evaluated. In the case of PLS, the function uses a sequence of integers from 1 to tuneLength . If we want to evaluate all integers between 1 and 15, setting tuneLength = 15 would achieve this.

What is trainControl?

4 The trainControl Function. The function trainControl generates parameters that further control how models are created, with possible values: method : The resampling method: "boot" , "cv" , "LOOCV" , "LGOCV" , "repeatedcv" , "timeslice" , "none" and "oob" .

What is tuneGrid R?

# The tuneGrid parameter lets us decide which values the main parameter will take # While tuneLength only limit the number of default parameters to use.

What is tuneLength?

tuneLength = It allows system to tune algorithm automatically. It indicates the number of different values to try for each tunning parameter. For example, mtry for randomForest. Suppose, tuneLength = 5, it means try 5 different mtry values and find the optimal mtry value based on these 5 values.


2 Answers

I have always found the functions in caret both useful and somewhat maddening. Here's what's going on.

You're attempting to pass an argument to earth via the ... argument to train. The documentation for train contains this description for that argument:

arguments passed to the classification or regression routine (such as randomForest). Errors will occur if values for tuning parameters are passed here.

Tuning parameter, eh? Well, if you scroll down and examine the official list of tuning parameters for each model type, you'll see that for earth, they are degree and nprune.

So the issue here is that train is designed to automate some grid searching along tuning parameters, and the ... argument is to be used for passing further arguments to the model fitting function except for those tuning parameters.

If you want to set the tuning parameters you'll need to use other arguments, like so:

train(Volume~Girth+Height, data=trees, method='earth',
      tuneGrid = data.frame(.degree = 1,.nprune = 5))

Note how the columns are named with leading periods. Also, it is frustrating that since the default value in earth for nprune is NULL, I'm not sure you can pass only the default values in this way. (Generally, setting things to NULL in data frames will simply remove them.)

like image 148
joran Avatar answered Oct 05 '22 07:10

joran


I found out how to do it, joran led me into the right direction:

Create a new function which generates the training grid. This function must accept the two parameters len and data. In order to retrieve the original training grid, you can call the createGrid method provided by the caret package. You can then modify the grid to your needs. For example to neave the nprune parameter unchanged and add degree from 1 to 5 use the following code:

  createMARSGrid <- function(len, data) {
      g = createGrid("earth", len, data)
      g = expand.grid(.nprune=g$.nprune, .degree=seq(1,5))
      return(g)
  }   

Then invoke it like this:

train(formula, data=data, method='earth', tuneGrid = createMARSGrid)
like image 33
theomega Avatar answered Oct 05 '22 07:10

theomega