I'm new to R and to stackoverflow so I'm sorry if the question or it's format isn't ideal...
I'm trying to get some basic statistics from a matrix by using ddply and I wanted to make a process a bit faster by using for -loop. Unfortunately this wasn't as easy as I had thought...
Strain gene1 gene2 gene3 . . .
A 2.6336700 1.42802 0.935742
A 2.0634700 2.31232 1.096320
A 2.5798600 2.75138 0.714647
B 2.6031200 1.31374 1.214920
B 2.8319400 1.30260 1.191770
B 1.9796000 1.74199 1.056490
C 2.4030300 1.20324 1.069800
.
.
.
----------
for (n in c("gene1","gene2","gene3","gene4")) {
summary <- ddply(Data, .(Strain), summarise,
mean = mean(n),
sd = sd(n),
se = sd(n) / sqrt(length(n)) )
}
In results it reads that mean = 6 and both sd and se are "NA" ... obviously not what I had in mind.
If I get rid of the for -loop and manually insert the column name ("gene1"):
summary <- ddply(Data, .(Strain), summarise,
mean = mean(gene1),
sd = sd(gene1),
se = sd(gene1) / sqrt(length(gene1)) )
Now it seems to give me the correct result. Can someone enlighten me on this matter and tell me what I'm doing wrong?
I know you didn't ask for it, but here is a solution with aggregate
in base
.
# One line in base.
aggregate(Data[paste0('gene',1:3)],by=Data['Strain'],
function(x) c(mean=mean(x),sd=sd(x),se=sd(x)/sqrt(length(x))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With