Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating Multiple Lists Based on Multiple Subsets of a Larger Data.Frame

Tags:

r

subset

plyr

Working in R, I have data of a similar structure to below (code block 1). And I'm looking to create a new data.frame with the following characteristics:

For each unique ID_1 value, I'd like to have two new columns, one containing a list of (ID_2s that share ID_1 & Direction==1) and the other column containing a list of (ID_2s that share ID_1 & Direction==0), (see next code block 2)

Dataset Block 1 (initial):

ID_1    ID_2    Direction
100001  1           1
100001  11          1
100001  111         1
100001  1111        0
100001  11111       0
100001  111111      0
100002  2           1
100002  22          1
100002  222         0
100002  2222        0
100003  3           1
100003  33          1
100003  333         1
100003  3333        0
100003  33333       0
100003  333333      1
100004  4           1
100004  44          1

Converted into:

Dataset Block 2 (desired output):

ID_1    ID_2_D1             ID_2_D0
100001  1,11,111            1111,11111,111111
100002  2,22                222,222
100003  3,33,333,333333     3333,33333
100004  4,44    

I have code that does this, (taking loops of subset of subsets), but I am running this over many millions of unique "ID_1"s, making this very time consuming (hours, I tell ya!!).

Any advice - perhaps using apply() or the plyr() package that might get this to run faster?


Code for reference:

DF <- data.frame(ID_1=c(100001,100001,100001,100001,100001,100001,100002,100002,100002,100002,100003,100003,100003,100003,100003,100003,100004,100004)
                   ,ID_2=c(1,11,111,1111,11111,111111,2,22,222,2222,3,33,333,3333,33333,333333,4,44)
                   ,Direction=c(1,1,1,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1)
                   )

My current (too slow) code:

  DF2 <- data.frame( ID_1=DF[!duplicated(DF$ID_1),][,1])

  for (i in 1:length(unique(DF2$ID_1))){
    DF2$ID_2_D1[i] <- list(subset(DF,ID_1==unique(DF2$ID_1)[i] & Direction==1)$ID_2)
    DF2$ID_2_D0[i] <- list(subset(DF,ID_1==unique(DF2$ID_1)[i] & Direction==0)$ID_2)        
  }
like image 810
EconomiCurtis Avatar asked Dec 09 '22 17:12

EconomiCurtis


1 Answers

Like this:

library(reshape2)
dcast(DF, ID_1 ~ Direction, value.var = "ID_2", list)
#     ID_1                   0                  1
# 1 100001 1111, 11111, 111111         1, 11, 111
# 2 100002           222, 2222              2, 22
# 3 100003         3333, 33333 3, 33, 333, 333333
# 4 100004                                  4, 44    
like image 83
flodel Avatar answered Dec 11 '22 06:12

flodel