Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R formula with as.factor(): any way to specify the argument as a variable content instead of directly by name?

Tags:

r

For the sake of the following discussion, I'll create this fake training data frame:

> dataset = data.frame(result=c("yes","yes","no","no","no"),
                       s1=seq(0,8,2), s2=seq(1,9,2))
> dataset
  result s1 s2
1    yes  0  1
2    yes  2  3
3     no  4  5
4     no  6  7
5     no  8  9
> 

I'm trying to train multiple kernlab KSVM models from multiple data frames similar to the one shown above. The result column is actually named different for each of the data frames (it's named according to what the model trained with that dataset is supposed to be predicting).

I'm still pretty new to R, so the syntax I'm using is just modeled (no pun intended) after code I cut-and-pasted from Rattle's log tab:

trainedModel = ksvm(as.factor(result) ~ ., data=dataset[,c(input, target), ...)

...where result is the name of the column in the dataset data frame. I understand that as.factor(result) ~ . is a formula, and that what this means is that the stuff on the left side of the ~ is somehow derived from the stuff on the right side of the ~, and that the . just means "everything else not specified on the left side of the ~". At least I think that's what it means.

My problem is that I want to be able to create & train these models programmatically, and the name of the target column in the input dataset will change.

How can I specify "colnames(dataset)[1]" (i.e. the name of the column dynamically determined, without knowing the name of the column at coding time), in the code as.factor(result)?

like image 798
phonetagger Avatar asked May 01 '15 03:05

phonetagger


People also ask

How do you set a variable as a factor in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

How do you define a factor in R?

Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.

How do you check if a variable is a factor in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

Why do we convert variables to factors in R?

In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order. Historically, factors were much easier to work with than characters.


1 Answers

?as.formula, allows you to build a formula using paste. Putting these together you can create a formula based on variables, for example:

as.formula(paste("as.factor(",result_column,") ~ ."))
like image 184
Brandon Bertelsen Avatar answered Oct 06 '22 02:10

Brandon Bertelsen