Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using data.table to aggregate

After multiple suggestions from SO users, I am finally trying to convert my code over to using data.table.

library(data.table)
DT <- data.table(plate = paste0("plate",rep(1:2,each=5)),
             id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2),
             val = 1:10)

> DT
    plate   id val
1: plate1 CTRL   1
2: plate1 CTRL   2
3: plate1  ID1   3
4: plate1  ID2   4
5: plate1  ID3   5
6: plate2 CTRL   6
7: plate2 CTRL   7
8: plate2  ID1   8
9: plate2  ID2   9
10: plate2  ID3  10

What I would like to do is take the average of DT[,val] by plate when the id is "CTRL".

I would normally aggregate the data frame, then use match to map the values back to a new column, 'ctrl'.

Using the data.table package I can get:

DT[id=="CTRL",ctrl:=mean(val),by=plate]

> DT
    plate   id val ctrl
1: plate1 CTRL   1  1.5
2: plate1 CTRL   2  1.5
3: plate1  ID1   3   NA
4: plate1  ID2   4   NA
5: plate1  ID3   5   NA
6: plate2 CTRL   6  6.5
7: plate2 CTRL   7  6.5
8: plate2  ID1   8   NA
9: plate2  ID2   9   NA
10: plate2  ID3  10   NA

What I need is really:

DT <- data.table(plate = paste0("plate",rep(1:2,each=5)),
                 id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2),
                 val = 1:10,
                 ctrl = rep(c(1.5,6.5),each=5))

> DT
    plate   id val ctrl
1: plate1 CTRL   1  1.5
2: plate1 CTRL   2  1.5
3: plate1  ID1   3  1.5
4: plate1  ID2   4  1.5
5: plate1  ID3   5  1.5
6: plate2 CTRL   6  6.5
7: plate2 CTRL   7  6.5
8: plate2  ID1   8  6.5
9: plate2  ID2   9  6.5
10: plate2  ID3  10  6.5

Eventually I would like to use much more complicated selections of the values, but I do not know how to select specific values, run some function, then map those values back to the appropriate row using data frames.

like image 647
dayne Avatar asked Nov 07 '13 21:11

dayne


1 Answers

This is what you want to do:

DT[,ctrl:=mean(val[id=="CTRL"]),by=plate]

which gives

     plate   id val ctrl
 1: plate1 CTRL   1  1.5
 2: plate1 CTRL   2  1.5
 3: plate1  ID1   3  1.5
 4: plate1  ID2   4  1.5
 5: plate1  ID3   5  1.5
 6: plate2 CTRL   6  6.5
 7: plate2 CTRL   7  6.5
 8: plate2  ID1   8  6.5
 9: plate2  ID2   9  6.5
10: plate2  ID3  10  6.5

Your original code DT[id=="CTRL",ctrl:=mean(val),by=plate] did not make an assignment for rows where id=="CTRL" was not true because, when you use the first argument of [, you are subsetting; the operations in the second argument are only done for the subsetted data.table.

like image 137
Frank Avatar answered Oct 12 '22 14:10

Frank