I'm tuning an SVM in R and I receive the following error:
#Error in if (any(co)) { : missing value where TRUE/FALSE needed
I'm using caret package
svmRTune <- train(x=dataTrain[,predModelContinuous],y=dataTrain[,outcome],method = "svmRadial", tuneLength = 14, trControl = trCtrl)
the training set structure is
str(dataTrain)
'data.frame': 40001 obs. of 42 variables:
$ PolNum : num 2e+08 2e+08 2e+08 2e+08 2e+08 ...
$ sex : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 2 1 1 1 2 ...
$ type : Factor w/ 6 levels "A","B","C","D",..: 3 1 1 2 2 4 3 3 3 2 ...
$ catgry : Ord.factor w/ 3 levels "Large"<"Medium"<..: 2 2 2 3 3 3 3 2 2 2 ...
$ occup : Factor w/ 5 levels "Employed","Housewife",..: 2 1 1 1 5 4 1 1 4 2 ...
$ age : num 48 23 23 39 24 39 28 43 45 38 ...
$ group : Factor w/ 20 levels "1","2","3","4",..: 15 16 12 16 14 8 16 9 12 8 ...
$ bonus : Ord.factor w/ 21 levels "-50"<"-40"<"-30"<..: 14 8 4 3 5 2 5 5 1 15 ...
$ poldur : num 7 1 1 14 2 4 11 2 8 5 ...
$ value : num 1120 21755 18430 11930 24850 ...
$ adind : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 2 2 2 1 1 ...
$ Pcode : chr "SC22" "CT109" "MA1" "SA12" ...
$ Area : Factor w/ 10 levels "CT","JU","MA",..: 7 1 3 6 6 6 6 4 1 2 ...
$ Density : num 270.5 57.3 43.2 167.9 169.8 ...
$ Prem : num 1159 532 527 197 908 ...
$ Premad : num 53.1 413.7 410.7 61.6 824.6 ...
$ numclm : num 0 1 0 1 0 0 0 1 0 0 ...
$ Invite : num 1 1 1 1 1 1 1 1 1 1 ...
$ Renewaltp : num 1302 928 632 291 960 ...
$ Renewalad : num 58.4 599 440.4 71.3 682 ...
$ Markettp : num 1110 884 565 253 833 ...
$ Marketad : num 53.4 611.4 431.6 55.5 587 ...
$ Premtot : num 1212 532 527 259 908 ...
$ Renewaltot : num 1361 928 632 362 960 ...
$ Markettot : num 1163 884 565 309 833 ...
$ Renew : Ord.factor w/ 2 levels "No"<"Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Premchng : num 1.12 1.74 1.2 1.4 1.06 ...
$ Compmeas : num 1.17 1.05 1.12 1.17 1.15 ...
$ numclmRec : Ord.factor w/ 3 levels "None"<"One"<"Two or more": 1 2 1 2 1 1 1 2 1 1 ...
$ PremChngRec: Factor w/ 20 levels "[0.546,0.758)",..: 16 20 18 19 14 3 7 19 17 11 ...
$ ageRec : Factor w/ 20 levels "[19,22)","[22,25)",..: 14 2 2 9 2 9 4 11 12 9 ...
$ valueRec : Factor w/ 20 levels "[ 1005, 3290)",..: 1 15 13 9 17 5 12 12 19 1 ...
$ densityRec : Factor w/ 20 levels "[ 14.4, 25.0)",..: 19 6 5 15 15 13 15 1 5 11 ...
$ CompmeasRec: Factor w/ 20 levels "[0.716,0.869)",..: 12 6 10 13 12 18 11 16 18 14 ...
$ poldurRec : Ord.factor w/ 16 levels "1"<"2"<"3"<"4"<..: 7 1 1 14 2 4 11 2 8 5 ...
$ ageST : num 0.407 -1.34 -1.34 -0.222 -1.27 ...
$ numclmST : num -0.433 1.627 -0.433 1.627 -0.433 ...
$ PremchngST : num 0.591 3.709 0.98 1.985 0.265 ...
$ valueST : num -1.462 0.499 0.183 -0.434 0.793 ...
$ DensityST : num 1.918 -0.748 -0.924 0.636 0.659 ...
$ CompmeasST : num 0.224 -0.539 -0.098 0.248 0.113 ...
$ poldurST : num 0.097 -1.2 -1.2 1.61 -0.984 ...
and
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252
attached base packages:
[1] parallel splines grid stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] C50_0.1.0-16 kernlab_0.9-19 nnet_7.3-8 plyr_1.8.1
[5] gbm_2.1 randomForest_4.6-7 rpart_4.1-8 klaR_0.6-10
[9] MASS_7.3-31 doParallel_1.0.8 iterators_1.0.6 foreach_1.4.1
[13] pROC_1.7.1 mda_0.4-4 class_7.3-10 earth_3.2-7
[17] plotrix_3.5-5 plotmo_1.3-3 Formula_1.1-1 survival_2.37-7
[21] caret_6.0-24 ggplot2_0.9.3.1 lattice_0.20-29 rj_1.1.3-1
loaded via a namespace (and not attached):
[1] car_2.0-19 cluster_1.15.2 codetools_0.2-8
[4] colorspace_1.2-4 combinat_0.0-8 compiler_3.0.2
[7] dichromat_2.0-0 digest_0.6.4 gtable_0.1.2
[10] Hmisc_3.14-3 labeling_0.2 latticeExtra_0.6-26
[13] munsell_0.4.2 proto_0.3-10 RColorBrewer_1.0-5
[16] Rcpp_0.11.1 reshape2_1.2.2 rj.gd_1.1.3-1
[19] scales_0.2.3 stringr_0.6.2 tools_3.0.2
Just posting in case anyone else runs across this problem. It appears to be caused by including a factor or character variable in your training data set.
Why svm can not take a factor variable, I do not know. I replaced my factors with hand coded dummies, and it worked fine, but the approach was too inelegant to document.
I can confirm Dan Brown's answer, the error seem to be caused by having factors in the data. I wrote the following code to turn factors into dummy variables. It is not especially pretty but it does the job.
library("foreach")
# Helper function, use the other one
# takes a column name (pointing to a factor variable) and a dataset
# returns a dataframe containing a 1-in-K coding for this factor variable
col_to_dummy <- function(colname, data) {
# tmp is a dataframe of K columns, where K is the number of levels of the factor in colname
# it is a 1-in-K dummy variable coding
levelnames <- levels(data[[colname]])
dummy <- foreach(i=1:length(levelnames), .combine=cbind) %do% {
as.numeric(as.numeric(data[[colname]])==i)
}
dummy <- as.data.frame(dummy)
names(dummy) <- paste0(colname, ":", levelnames)
dummy
}
factor_to_dummy <- function(obsdata) {
# finding the columns containing a factor variable
col_factor <- unlist(lapply(FUN=is.factor, obsdata))
# if they are none, then nothing to do
if(!any(col_factor)) {
return(obsdata)
}
# otherwise
# for each of these, convert it to dummy variables using col_to_dummy
foreach(colname=names(which(col_factor)), .combine = cbind,
.init = obsdata[,-which(col_factor)]) %do% {
col_to_dummy(colname, obsdata)
}
# each resulting data.frame is c-bound with the dataset without factors
}
Some solution out there use model.matrix
, but realize that by default, model.matrix
uses a reference level (intercept) and then use a 1-of-(K-1) coding scheme for all factors. You will need to tinker with the contrast arguments to maybe get what you want.
This code is really easy to use. Once the function definitions have been ran, you can simply do:
df_with_dummy_vars <- factor_to_dummy(original_df)
All factor columns will be converted to dummy variables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With