R Dynamically build "list" in data.table (or ddply)

Question

My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically.

As a minimal example:

require(data.table)
type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3))
a <- (rep(1:3, 3))
b <- runif(9)
c <- runif(9)
df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F)
DT <-data.table(df)

This call:

DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type]

will have result similar to this:

    type suma     meanb      minc
1: hello    6 0.1332210 0.4265579
2:   bye    6 0.5680839 0.2993667
3:    ok    6 0.5694532 0.2069026

Future data.frames will have more columns that I will want to summarize differently. But for the sake of working with this small example: Is there a way to pass the list programatically?

I naïvely tried:

# create a different list
mylist <- "list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))"
# new call
DT[, mylist, by=type]

With the following error:

1: hello
2:   bye
3:    ok
mylist
1: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
2: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
3: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))

Any hints appreciated! Best regards!

PS sorry about these as.numeric(), I could not quite figure out why, but I needed them for the example to run.

Minor edit inserted columns / before data.frame in initial sentence to clarify my needs.

mnel · Accepted Answer

This is explained FAQ 1.6 what you are looking for is quote and eval

something like

 mycall <- quote(list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c))))

 DT[, eval(mycall)]

After a bit of head-banging, here is a very ugly way of constructing the call for ddply using .()

myplyrcall <- .(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))

do.call(ddply,c(.data = quote(DF), .variables = 'type',.fun = quote(summarise),myplyrcall))

You could also use as.quoted which has an as.quoted.character method to construct using paste0

myplc <-as.quoted(c("lengtha" = "length(as.numeric(a))", "maxb" = "max(as.numeric(b))", "meanc" = "mean(as.numeric(c))"))

This can be used with data.table as well!

dtcall <- as.quoted(mylist)[[1]]


DT[,eval(dtcall), by = type]

data.table all the way.

Arun · Answer

Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9), 
                 b = rnorm(9), c=runif(9), d=sample(9), e=sample(9), 
                 f = runif(9), g=rnorm(9))

#     type a          b         c d e         f          g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3  1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597  1.7216593
# 4:   bye 8 -0.2260640 0.3924327 8 2 0.1271187  0.4360063
# 5:   bye 7 -1.0720503 0.3256450 7 8 0.5774691  0.7571990
# 6:   bye 5 -0.7131021 0.4855804 6 9 0.2687791  1.5398858
# 7:    ok 1 -0.4680549 0.8476840 2 4 0.5633317  1.5393945
# 8:    ok 4  0.4183264 0.4402595 4 1 0.7592801  2.1829996
# 9:    ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758

# 1) set key
setkey(dt, "type")

# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))

# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]

# 4) merge them.
dt1[dt2[dt3]]

#     type  a  d  e          b          g         c         f
# 1:   bye 20 21 19 -0.6704055  0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3:    ok 14 11 10 -0.5104907  0.9090061 0.5080116 0.5633317

If/when you have many many column, making a list like the one you've might be cumbersome.

R Dynamically build "list" in data.table (or ddply)

Tags:

r

aggregation

data.table

plyr

jjap

2 Answers

mnel

Arun

Recent Activity

Donate For Us

R Dynamically build "list" in data.table (or ddply)

Tags:

r

aggregation

data.table

plyr

jjap

2 Answers

mnel

Arun

Related questions

Recent Activity

Donate For Us