Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table column names not working within a function

I am trying to use a data.table within a function, and I am trying to understand why my code is failing. I have a data.table as follows:

DT <- data.table(my_name=c("A","B","C","D","E","F"),my_id=c(2,2,3,3,4,4))
> DT
   my_name my_id
1:       A     2
2:       B     2
3:       C     3
4:       D     3
5:       E     4
6:       F     4

I am trying to create all pairs of "my_name" with different values of "my_id", which for DT would be:

Var1 Var2    
A    C
A    D
A    E
A    F
B    C
B    D
B    E
B    F
C    E
C    F
D    E
D    F

I have a function to return all pairs of "my_name" for a given pair of values of "my_id" which works as expected.

get_pairs <- function(id1,id2,tdt) {
    return(expand.grid(tdt[my_id==id1,my_name],tdt[my_id==id2,my_name]))
}
> get_pairs(2,3,DT)
Var1 Var2
1    A    C
2    B    C
3    A    D
4    B    D

Now, I want to execute this function for all pairs of ids, which I try to do by finding all pairs of ids and then using mapply with the get_pairs function.

> combn(unique(DT$my_id),2)
     [,1] [,2] [,3]
[1,]    2    2    3
[2,]    3    4    4
tid1 <- combn(unique(DT$my_id),2)[1,]
tid2 <- combn(unique(DT$my_id),2)[2,]
mapply(get_pairs, tid1, tid2, DT)
Error in expand.grid(tdt[my_id == id1, my_name], tdt[my_id == id2, my_name]) : 
  object 'my_id' not found

Again, if I try to do the same thing without an mapply, it works.

get_pairs3(tid1[1],tid2[1],DT)
Var1 Var2
1    A    C
2    B    C
3    A    D
4    B    D

Why does this function fail only when used within an mapply? I think this has something to do with the scope of data.table names, but I'm not sure.

Alternatively, is there a different/more efficient way to accomplish this task? I have a large data.table with a third id "sample" and I need to get all of these pairs for each sample (e.g. operating on DT[sample=="sample_id",] ). I am new to the data.table package, and I may not be using it in the most efficient way.

like image 704
Sam Avatar asked Jun 25 '15 13:06

Sam


1 Answers

The function debugonce() is extremely useful in these scenarios.

debugonce(mapply)
mapply(get_pairs, tid1, tid2, DT)

# Hit enter twice
# from within BROWSER
debugonce(FUN)
# Hit enter twice
# you'll be inside your function, and then type DT
DT
# [1] "A" "B" "C" "D" "E" "F"
Q # (to quit debugging mode)

which is wrong. Basically, mapply() takes the first element of each input argument and passes it to your function. In this case you've provided a data.table, which is also list. So, instead of passing the entire data.table, it's passing each element of the list (columns).

So, you can get around this by doing:

mapply(get_pairs, tid1, tid2, list(DT))

But mapply() simplifies the result by default, and therefore you'd get a matrix back. You'll have to use SIMPLIFY = FALSE.

mapply(get_pairs, tid1, tid2, list(DT), SIMPLIFY = FALSE)

Or simply use Map:

Map(get_pairs, tid1, tid2, list(DT))

Use rbindlist() to bind the results.

HTH

like image 168
Arun Avatar answered Sep 28 '22 09:09

Arun