Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R generate all possible interaction variables

Tags:

r

statistics

I have a dataframe with variables, say a,b,c,d

dat <- data.frame(a=runif(1e5), b=runif(1e5), c=runif(1e5), d=runif(1e5))

and would like to generate all possible two-way interaction terms between each of the columns, that is: ab, ac, ad, bc, bd, cd. In reality my dataframe has over 100 columns, so I cannot code this manually. What is the most efficient way to do this (noting that I do not want both ab and ba)?

like image 291
user3725021 Avatar asked Aug 09 '15 14:08

user3725021


1 Answers

What do you plan to do with all these interaction terms? There are several options, which is best will depend on what you are trying to do.

If you want to pass the interactions to a modeling function like lm or aov then it is very simple, just use the .^2 syntax:

fit <- lm( y ~ .^2, data=mydf )

The above will call lm and tell it to fit all the main effects and all 2 way interaction for the variables in mydf excluding y.

If for some reason you really want to calculate all the interactions then you can use model.matrix:

tmp <- model.matrix( ~.^2, data=iris)

This will include a column for the intercept and columns for the main effects, but you can drop those if you don't want them.

If you need something different from the modeling then you can use the combn function as @akrun mentions in the comments.

like image 79
Greg Snow Avatar answered Oct 09 '22 19:10

Greg Snow