For the sake of the following discussion, I'll create this fake training data frame:
> dataset = data.frame(result=c("yes","yes","no","no","no"),
s1=seq(0,8,2), s2=seq(1,9,2))
> dataset
result s1 s2
1 yes 0 1
2 yes 2 3
3 no 4 5
4 no 6 7
5 no 8 9
>
I'm trying to train multiple kernlab KSVM models from multiple data frames similar to the one shown above. The result
column is actually named different for each of the data frames (it's named according to what the model trained with that dataset is supposed to be predicting).
I'm still pretty new to R, so the syntax I'm using is just modeled (no pun intended) after code I cut-and-pasted from Rattle's log tab:
trainedModel = ksvm(as.factor(result) ~ ., data=dataset[,c(input, target), ...)
...where result
is the name of the column in the dataset
data frame. I understand that as.factor(result) ~ .
is a formula, and that what this means is that the stuff on the left side of the ~
is somehow derived from the stuff on the right side of the ~
, and that the .
just means "everything else not specified on the left side of the ~
". At least I think that's what it means.
My problem is that I want to be able to create & train these models programmatically, and the name of the target column in the input dataset will change.
How can I specify "colnames(dataset)[1]" (i.e. the name of the column dynamically determined, without knowing the name of the column at coding time), in the code as.factor(result)
?
In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.
Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.
We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.
In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order. Historically, factors were much easier to work with than characters.
?as.formula
, allows you to build a formula using paste
. Putting these together you can create a formula based on variables, for example:
as.formula(paste("as.factor(",result_column,") ~ ."))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With