Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R formula: wrap all variables in a transformation

Tags:

r

I have a formula with an arbitrary number of variables on the left and right-hand sides:

a + b * c ~ d + e

This formula can include various operators like + or *. I would like to wrap each variable of the formula in a transformation. For example, if my transformation is called Factor, then the formula above becomes:

Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)

Notice that it preserved the same signs.

like image 444
Vincent Avatar asked May 07 '21 11:05

Vincent


People also ask

What is square root transformation in R with example?

2. Square Root Transformation: Transform the response variable from y to √y. 3. Cube Root Transformation: Transform the response variable from y to y1/3. By performing these transformations, the response variable typically becomes closer to normally distributed. The following examples show how to perform these transformations in R.

Is the transform function in R useful for data manipulation?

However, the R programming language provides many different functions for data manipulation and depending on your specific needs other functions might be preferable. In fact, the transform function is much less popular than other functions such as cbind or rbind.

What are formulas in R?

Since formulas are a special class in the R programming language, it's a good idea to briefly revise the data types and data structures that you have available in this programming language. Remember R is an object-oriented programming language: this language is organized around objects. Everything in R is an object.

How do you change the response variable of a function?

One way to address this issue is to transform the response variable using one of the three transformations: 1. Log Transformation: Transform the response variable from y to log (y). 2. Square Root Transformation: Transform the response variable from y to √y.


2 Answers

1) rrapply We can use rrapply to recursively walk the formula and surround every node that is a syntactic name with Factor(...). Alternately we could use is.word <- function(x) grepl("^\\w+$", x) to check for names that only contain word characters.

library(rrapply)
fo <- a + b * c ~ d + e

is.word <- function(x) make.names(x) == x
insert.Factor <- function(x) substitute(Factor(x), list(x = x))

rrapply(fo, is.word, insert.Factor)
## Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)

If we can have formulas such as

fo2 <- a + b * c ~ I(d) + e

and we want I(Factor(d)) rather than Factor(I)(Factor(d)) then use this for is.word:

is.word <- function(x) make.names(x) == x && format(x) %in% all.vars(fo2)

2) gsub Convert to character string, perform the substitution and convert back. The input, fo, is defined above.

formula(gsub("(\\w+)", "Factor(\\1)", format(fo)), environment(fo))
## Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)

3) Transform data frame If these variables will be obtained from a data frame DF then we could transform its columns and leave the formula as is.

DF[] <- lapply(DF, Factor)
like image 73
G. Grothendieck Avatar answered Nov 11 '22 01:11

G. Grothendieck


Here is a way to update a formula with a recursive function:

update_formula <- function(x){
    if(length(x) == 3){
        x[[2]] <- update_formula(x[[2]])
        x[[3]] <- update_formula(x[[3]])
        return(x)
    }else{
        return(substitute(Factor(var), list(var = x)))
    }
}

f <- a + b * c ~ d + e
update_formula(f)
# Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)

The main idea is that each binary operator corresponds to a list of length 3. For example:

> as.list(f)
[[1]]
`~`

[[2]]
a + b * c

[[3]]
d + e

> as.list(f[[2]])
[[1]]
`+`

[[2]]
a

[[3]]
b * c

> as.list(f[[3]])
[[1]]
`+`

[[2]]
d

[[3]]
e

So we update the second and third component each time we encounter a binary operator.

To apply arbitrary transformation:

update_formula2 <- function(x, trans){
    if(length(x) == 3){
        x[[2]] <- update_formula2(x[[2]], trans)
        x[[3]] <- update_formula2(x[[3]], trans)
        return(x)
    }else{
        return(substitute(fun(var), list(fun = trans, var = x)))
    }
}

f <- a + b * c ~ d + e
update_formula2(f, quote(Factor))
# Factor(a) + Factor(b) * Factor(c) ~ Factor(d) + Factor(e)
update_formula2(f, quote(log))
# log(a) + log(b) * log(c) ~ log(d) + log(e)
like image 41
mt1022 Avatar answered Nov 11 '22 02:11

mt1022