Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mean of variable by two factors

Tags:

r

tapply

I have the following data:

a <- c(1,1,1,1,2,2,2,2)
b <- c(2,4,6,8,2,3,4,1)
c <- factor(c("A","B","A","B","A","B","A","B"))
df <- data.frame(
    sp=a,
    length=b,
    method=c)

I can use the following to get a count of the number of samples of each species by method:

n <- with(df,tapply(sp,method,function(x) count(x)))

How do I also get the mean length by method for each species?

like image 316
Ben Avatar asked May 21 '13 07:05

Ben


People also ask

How do you find the mean of two variables?

It's obtained by simply dividing the sum of all values in a data set by the number of values. The calculation can be done from raw data or for data aggregated in a frequency table.

How do you find the mean of a factor in R?

To find the column means by factor levels, we can use summarise function along with mean function after creating the group of factor levels with group_by function.

What is factor level in statistics?

Factor levels are all of the values that the factor can take (recall that a categorical variable has a set number of groups). In a designed experiment, the treatments represent each combination of factor levels. If there is only one factor with k levels, then there would be k treatments.


2 Answers

Personally I would use aggregate:

aggregate(length ~ sp, data = df, FUN= "mean" )
# by species only
#     sp length
#1  1    5.0
#2  2    2.5

aggregate(length ~ sp + method, data = df, FUN= "mean" )
    # by species and method
#  sp method length
#1  1      A      4
#2  2      A      3
#3  1      B      6
#4  2      B      2

for everything together you may want:

aggregate(length ~ method, data = df, function(x) c(m = mean(x), counts = length(x)) )

# counts and mean for each method
#  method length.m length.counts
#1      A      3.5           4.0
#2      B      4.0           4.0
like image 57
user1317221_G Avatar answered Oct 18 '22 13:10

user1317221_G


The library plyr is very helpful for stuff like this

library(plyr)
new.df <- ddply(df, c("method", "sp"), summarise,
                mean.length=mean(length),
                max.length=max(length),
                n.obs=length(length))

gives you

> new.df
  method sp mean.length max.length n.obs
1      A  1           4          6     2
2      A  2           3          4     2
3      B  1           6          8     2
4      B  2           2          3     2

More examples at http://www.inside-r.org/packages/cran/plyr/docs/ddply.

like image 26
Adrian Avatar answered Oct 18 '22 13:10

Adrian