Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

quick/elegant way to construct mean/variance summary table

Tags:

I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...

For a specified set of categorical factors I want to construct a table of means and variances by group.

generate data:

set.seed(1001) d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],                  f3=factor(as.character(as.roman(1:3))),rep=1:4) d$y <- runif(nrow(d)) d$z <- rnorm(nrow(d)) 

desired output:

  f1 f2  f3    y.mean      y.var 1  A  a   I 0.6502307 0.09537958 2  A  a  II 0.4876630 0.11079670 3  A  a III 0.3102926 0.20280568 4  A  b   I 0.3914084 0.05869310 5  A  b  II 0.5257355 0.21863126 6  A  b III 0.3356860 0.07943314 ... etc. ... 

using aggregate/merge:

library(reshape) m1 <- aggregate(y~f1*f2*f3,data=d,FUN=mean) m2 <- aggregate(y~f1*f2*f3,data=d,FUN=var) mvtab <- merge(rename(m1,c(y="y.mean")),       rename(m2,c(y="y.var"))) 

using ddply/summarise (possibly best but haven't been able to make it work):

mvtab2 <- ddply(subset(d,select=-c(z,rep)),                 .(f1,f2,f3),                 summarise,numcolwise(mean),numcolwise(var)) 

results in

Error in output[[var]][rng] <- df[[var]] :    incompatible types (from closure to logical) in subassignment type fix 

using melt/cast (maybe best?)

mvtab3 <- cast(melt(subset(d,select=-c(z,rep)),           id.vars=1:3),      ...~.,fun.aggregate=c(mean,var)) ## now have to drop "variable" mvtab3 <- subset(mvtab3,select=-variable) ## also should rename response variables 

Won't (?) work in reshape2. Explaining ...~. to someone could be tricky!

like image 415
Ben Bolker Avatar asked Sep 16 '11 18:09

Ben Bolker


1 Answers

Here is a solution using data.table

library(data.table) d2 = data.table(d) ans = d2[,list(avg_y = mean(y), var_y = var(y)), 'f1, f2, f3'] 
like image 194
Ramnath Avatar answered Oct 21 '22 03:10

Ramnath