Possible Duplicate:
R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs.
I'm using R and would love some help with a problem I'm having:
I have a dataframe (df) with a column ID and a column Emotion. Each value in ID corresponds with 40-300 values in Emotion (so it's not a set number). I need to calculate the mean of all i's in Emotion for each j in ID. So this is what the data looks like
df$ID = (1, 1, 1, 1, 2, 2, 3)
df$Emotion = (2, 4, 6, 4, 1, 1, 8)
so the vector of means should look like this: (4, 1, 8)
Any help would be greatly appreciated!
You can use aggregate
ID = c(1, 1, 1, 1, 2, 2, 3)
Emotion = c(2, 4, 6, 4, 1, 1, 8)
df <- data.frame(ID, Emotion)
aggregate(.~ID, data=df, mean)
   ID Emotion
1  1       4
2  2       1
3  3       8
sapply could also be useful (this other solution will give you a vector)
sapply(split(df$Emotion, df$ID), mean) 
1 2 3 
4 1 8 
There are a lot of ways to do it including ddply from plyr package, data.table package, other combinations of split and lapply, dcast from reshape2 package. See this question for further solutions.
This is precisely the job tapply was designed to do.
tapply(df$ID , df$Emotion, mean) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With