I have a formula with an arbitrary number of variables on the left and right-hand sides:
a + b * c ~ d + e
This formula can include various operators like +
or *
. I would like to wrap each variable of the formula in a transformation. For example, if my transformation is called Factor
, then the formula above becomes:
Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)
Notice that it preserved the same signs.
2. Square Root Transformation: Transform the response variable from y to √y. 3. Cube Root Transformation: Transform the response variable from y to y1/3. By performing these transformations, the response variable typically becomes closer to normally distributed. The following examples show how to perform these transformations in R.
However, the R programming language provides many different functions for data manipulation and depending on your specific needs other functions might be preferable. In fact, the transform function is much less popular than other functions such as cbind or rbind.
Since formulas are a special class in the R programming language, it's a good idea to briefly revise the data types and data structures that you have available in this programming language. Remember R is an object-oriented programming language: this language is organized around objects. Everything in R is an object.
One way to address this issue is to transform the response variable using one of the three transformations: 1. Log Transformation: Transform the response variable from y to log (y). 2. Square Root Transformation: Transform the response variable from y to √y.
1) rrapply We can use rrapply
to recursively walk the formula and surround every node that is a syntactic name with Factor(...)
. Alternately we could use is.word <- function(x) grepl("^\\w+$", x)
to check for names that only contain word characters.
library(rrapply)
fo <- a + b * c ~ d + e
is.word <- function(x) make.names(x) == x
insert.Factor <- function(x) substitute(Factor(x), list(x = x))
rrapply(fo, is.word, insert.Factor)
## Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)
If we can have formulas such as
fo2 <- a + b * c ~ I(d) + e
and we want I(Factor(d)) rather than Factor(I)(Factor(d)) then use this for is.word
:
is.word <- function(x) make.names(x) == x && format(x) %in% all.vars(fo2)
2) gsub Convert to character string, perform the substitution and convert back. The input, fo
, is defined above.
formula(gsub("(\\w+)", "Factor(\\1)", format(fo)), environment(fo))
## Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)
3) Transform data frame If these variables will be obtained from a data frame DF then we could transform its columns and leave the formula as is.
DF[] <- lapply(DF, Factor)
Here is a way to update a formula with a recursive function:
update_formula <- function(x){
if(length(x) == 3){
x[[2]] <- update_formula(x[[2]])
x[[3]] <- update_formula(x[[3]])
return(x)
}else{
return(substitute(Factor(var), list(var = x)))
}
}
f <- a + b * c ~ d + e
update_formula(f)
# Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)
The main idea is that each binary operator corresponds to a list of length 3. For example:
> as.list(f)
[[1]]
`~`
[[2]]
a + b * c
[[3]]
d + e
> as.list(f[[2]])
[[1]]
`+`
[[2]]
a
[[3]]
b * c
> as.list(f[[3]])
[[1]]
`+`
[[2]]
d
[[3]]
e
So we update the second and third component each time we encounter a binary operator.
To apply arbitrary transformation:
update_formula2 <- function(x, trans){
if(length(x) == 3){
x[[2]] <- update_formula2(x[[2]], trans)
x[[3]] <- update_formula2(x[[3]], trans)
return(x)
}else{
return(substitute(fun(var), list(fun = trans, var = x)))
}
}
f <- a + b * c ~ d + e
update_formula2(f, quote(Factor))
# Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)
update_formula2(f, quote(log))
# log(a) + log(b) * log(c) ~ log(d) + log(e)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With