Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in train from Caret

Tags:

r

r-caret

I'm baffled. I've used train before with no problem. But now I'm repeatedly getting the "unused arguments" error.

#Generate random data
y <- rnorm(100, mean=.5)
x <- rnorm(100)
data <- cbind(x, y)
form <- y ~ x

fitControl <- trainControl(## 10-fold CV
                       method = "cv",
                       number = 8)

set.seed(825)
lmFit1 <- train(x, y, method = "lm", trControl = fitControl, na.action=na.omit)
lmFit1 <- train(form, data = data, method = "lm", trControl = fitControl, na.action=na.omit)

Since I am running a linear regression, I've specified this model both with x and y, and with form. Both generate the same error.

Error in train(form, method = "lm", trControl = fitControl, na.action = na.omit) : unused arguments (method = "lm", trControl = fitControl, na.action = na.omit)
Error in train(x, y, method = "lm", trControl = fitControl, na.action = na.omit) : unused arguments (y, method = "lm", trControl = fitControl, na.action = na.omit)

In my actual data, I have many more predictors, and have played around with only including 1 or 2 predictors at a time, but all generate the same error. Even the random data generates the error.

Any thoughts? Help is much appreciated! Thanks!

like image 255
user2573355 Avatar asked Sep 08 '15 16:09

user2573355


People also ask

Is there a way to get more than two levels in caret?

I’ve spent more hours trouble shooting caret package code than any other. Not saying I’ll be able to help, but I’m willing to try. In addition to the other user's advice, there is actually a built-in function called multiClassSummary which you can use when you have more than two levels in the target class.

Does the last column of the data used in train function need diagnosis?

It seems that the last column of the data used in train function need to be diagnosis.

What is the use of caret in ggplot2?

plot.train: Plot Method for the train Class In caret: Classification and Regression Training. Description. This function takes the output of a train object and creates a line or level plot using the lattice or ggplot2 libraries.

How does the random search work in traincontrol?

If trainControl has the option search = "random", this is the maximum number of tuning parameter combinations that will be generated by the random search. (NOTE: If given, this argument must be named.) A formula of the form y ~ x1 + x2 + ... Data frame from which variables specified in formula or recipe are preferentially to be taken.


2 Answers

I also had the same issue. It seems that there is another package loaded in your session that also has a function defined as train. Use caret::train instead of train

like image 88
Abhinav Piplani Avatar answered Sep 28 '22 20:09

Abhinav Piplani


you probably updated the caret package. If you look into the changelog from the package, you can see the following:

Changes in version 6.0-34

For the input data x to train, we now respect the class of the input value to accommodate other data types (such as sparse matrices). There are some complications though; for pre-processing we throw a warning if the data are not simple matrices or data frames since there is some infrastructure that does not exist for other classes( e.g. complete.cases). We also throw a warning if returnData <- TRUE and it cannot be converted to a data frame. This allows the use of sparse matrices and text corpus to be used as inputs into that function.

Further in the help:

x an object where samples are in rows and features are in columns. This could be a simple matrix, data frame or other type (e.g. sparse matrix). See Details below.

And the details:

The predictors in x can be most any object as long as the underlying model fit function can deal with the object class. The function was designed to work with simple matrices and data frame inputs, so some functionality may not work (e.g. pre-processing). When using string kernels, the vector of character strings should be converted to a matrix with a single column.

I have no issue with the second train model, for the first model, just add data.frame(x) instead of x.

library(caret)

#Generate random data
y <- rnorm(100, mean=.5)
x <- rnorm(100)
data <- cbind(x, y)
form <- y ~ x    

fitControl <- trainControl(## 10-fold CV
          method = "cv",
          number = 8)

set.seed(825)
# changed x to data.frame(x)
lmFit1 <- train(data.frame(x), y, method = "lm", trControl = fitControl, na.action=na.omit)
set.seed(825)
lmFit2 <- train(form, data = data, method = "lm", trControl = fitControl, na.action=na.omit)

my sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-52    ggplot2_1.0.1   lattice_0.20-33

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0         magrittr_1.5        splines_3.2.2       MASS_7.3-43         munsell_0.4.2       colorspace_1.2-6    foreach_1.4.2      
 [8] minqa_1.2.4         car_2.1-0           stringr_1.0.0       plyr_1.8.3          tools_3.2.2         parallel_3.2.2      pbkrtest_0.4-2     
[15] nnet_7.3-10         grid_3.2.2          gtable_0.1.2        nlme_3.1-121        mgcv_1.8-7          quantreg_5.19       MatrixModels_0.4-1 
[22] iterators_1.0.7     gtools_3.5.0        lme4_1.1-9          digest_0.6.8        Matrix_1.2-2        nloptr_1.0.4        reshape2_1.4.1     
[29] codetools_0.2-14    stringi_0.5-5       compiler_3.2.2      BradleyTerry2_1.0-6 scales_0.3.0        stats4_3.2.2        SparseM_1.7        
[36] brglm_0.5-9         proto_0.3-10       
like image 22
phiver Avatar answered Sep 28 '22 20:09

phiver