When applying a function with multiple output variables (e.g., a list) to a subset of a data.table, I lose the variable names. Is there a way to retain them?
library(data.table)
foo <- function(x){
list(mn = mean(x), sd = sd(x))
}
bar <- data.table(x=1:8, y=c("d","e","f","g"))
# column names "mn" and "sd" are replaced by "V1" and "V2"
bar[, sapply(.SD, foo), by = y, .SDcols="x"]
# column names "mn" and "sd" are retained
bar_split <- split(bar$x, bar$y)
t(sapply(bar_split, foo))
The setNames function lets you add back the missing character vector.:
bar[, setNames( sapply(.SD, foo), c("mn", "sd")), by = y, .SDcols="x"]
y mn sd
1: d 3 2.828427
2: e 4 2.828427
3: f 5 2.828427
4: g 6 2.828427
The authors suggested using the other form suggested by Arenburg:
DT[, c('x2', 'y2') := list(x / sum(x), y / sum(y)), by = grp]
I would go wit the following, which is a bit awkward, but doesn't require writing the names manually no matter how many functions there are
bar[, as.list(unlist(lapply(.SD, foo))), by = y, .SDcols = "x"]
# y x.mn x.sd
# 1: d 3 2.828427
# 2: e 4 2.828427
# 3: f 5 2.828427
# 4: g 6 2.828427
The biggest advantage of this approach is that it binds the functions with the column names. If, for example, you would have an additional column, it will still give an informative result while using the same code as above
set.seed(1)
bar[, z := sample(8)]
bar[, as.list(unlist(lapply(.SD, foo))), by = y, .SDcols = c("x", "z")]
# y x.mn x.sd z.mn z.sd
# 1: d 3 2.828427 2.0 1.4142136
# 2: e 4 2.828427 7.5 0.7071068
# 3: f 5 2.828427 3.0 1.4142136
# 4: g 6 2.828427 5.5 0.7071068
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With