I want the same results as in R summarizing multiple columns with data.table but for several summary functions.
Here is an example
data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2]))))
res <- data[, rbindlist(lapply(.SD, function(x) {
return(list(name = "varname", mean = mean(x), sd = sd(x)))
}))
, by = group, .SDcols = c("x1", "x2")
]
And get the following result:
group name mean sd
1: b varname 0.5755798 0.2723767
2: b varname 5.5108886 2.7649262
3: a varname 0.4906111 0.3060961
4: a varname 4.7780189 2.9740149
How can I get column names ('x1', 'x2') in second column? I guess that I need to substitute rbindlist
to something else, but what? Is there any simple solution?
An alternative way would be to construct your own function so that you can avoid this rbindlist
wrap (which I find is unnecessary) which gives you the freedom of constructing your function the way you want:
tmp <- function(x) {
mm <- colMeans(x)
ss=sapply(x, sd)
list(names=names(x), mean=mm, sd=ss)
}
data[, tmp(.SD), by=group]
group names mean sd
1: a x1 0.4988514 0.2770122
2: b x1 0.5246786 0.3014248
3: a x2 4.8031253 2.7978401
4: b x2 4.9104108 2.9135656
You can iterate your lapply
on names(.SD)
instead of .SD
. Something like this :
data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2]))))
res <- data[, rbindlist(lapply(names(.SD), function(name) {
return(list(name = name, mean = mean(.SD[[name]]), sd = sd(.SD[[name]])))
}))
, by = group, .SDcols = c("x1", "x2")]
Which gives :
group name mean sd
1: b x1 0.5344272 0.2697610
2: b x2 4.7628178 2.8313825
3: a x1 0.5008916 0.2686017
4: a x2 4.6175027 2.8942875
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With