Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Force data.table to compute all interactions

Here is a data.table:

dat = data.table(var1=rnorm(120), var2=rep(c('a','b','c'),40), var3=rep(c(1,2,3,2,1,2,1,2,2,3,1,2),10))

dat2 = dat[,list(resp = mean(var1)),by=list(var2, var3)]

In dat2, only existing interactions of dat$var2 et dat$var3 are present. How can I force dat2 to contain results for all 9 possible interactions (instead of the 7 rows of dat2) for var2 and var3? If there is no direct solutions with data.table, what is the easiest way to solve this issue?

table(dat$var2, dat$var3)

     1  2  3
  a 20 10 10
  b 20 20  0
  c  0 30 10

Of course, for the interactions where no data exist in dat, dat2 should contain NA in resp.

like image 758
Remi.b Avatar asked Dec 13 '13 12:12

Remi.b


1 Answers

You could set the key and then do a crossjoin using CJ in the i like so...

setkey( dat , var2 , var3 )

# Thanks to @Shadow for pointing out to use unique() in the cross join
dat[ CJ( unique(var2) , unique(var3) ) , mean(var1) ]
#   var2 var3          V1
#1:    a    1 -0.25771923
#2:    a    2  0.04143057
#3:    a    3  0.28878451
#4:    b    1  0.18865887
#5:    b    2  0.53632552
#6:    b    3          NA
#7:    c    1          NA
#8:    c    2  0.38015021
#9:    c    3  0.49809159

And by way of explanation, CJ() creates a data.table in the i of x (in this case dat) to join on. It is formed as the cross product of the vectors supplied to CJ(), which happens to be precisely what you are looking for!

like image 136
Simon O'Hanlon Avatar answered Nov 10 '22 09:11

Simon O'Hanlon