I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...
For a specified set of categorical factors I want to construct a table of means and variances by group.
generate data:
set.seed(1001) d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3], f3=factor(as.character(as.roman(1:3))),rep=1:4) d$y <- runif(nrow(d)) d$z <- rnorm(nrow(d))
desired output:
f1 f2 f3 y.mean y.var 1 A a I 0.6502307 0.09537958 2 A a II 0.4876630 0.11079670 3 A a III 0.3102926 0.20280568 4 A b I 0.3914084 0.05869310 5 A b II 0.5257355 0.21863126 6 A b III 0.3356860 0.07943314 ... etc. ...
using aggregate
/merge
:
library(reshape) m1 <- aggregate(y~f1*f2*f3,data=d,FUN=mean) m2 <- aggregate(y~f1*f2*f3,data=d,FUN=var) mvtab <- merge(rename(m1,c(y="y.mean")), rename(m2,c(y="y.var")))
using ddply
/summarise
(possibly best but haven't been able to make it work):
mvtab2 <- ddply(subset(d,select=-c(z,rep)), .(f1,f2,f3), summarise,numcolwise(mean),numcolwise(var))
results in
Error in output[[var]][rng] <- df[[var]] : incompatible types (from closure to logical) in subassignment type fix
using melt
/cast
(maybe best?)
mvtab3 <- cast(melt(subset(d,select=-c(z,rep)), id.vars=1:3), ...~.,fun.aggregate=c(mean,var)) ## now have to drop "variable" mvtab3 <- subset(mvtab3,select=-variable) ## also should rename response variables
Won't (?) work in reshape2
. Explaining ...~.
to someone could be tricky!
Here is a solution using data.table
library(data.table) d2 = data.table(d) ans = d2[,list(avg_y = mean(y), var_y = var(y)), 'f1, f2, f3']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With