Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating column means based on values in another column [duplicate]

Tags:

r

Possible Duplicate:
R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs.

I'm using R and would love some help with a problem I'm having:

I have a dataframe (df) with a column ID and a column Emotion. Each value in ID corresponds with 40-300 values in Emotion (so it's not a set number). I need to calculate the mean of all i's in Emotion for each j in ID. So this is what the data looks like

df$ID = (1, 1, 1, 1, 2, 2, 3)
df$Emotion = (2, 4, 6, 4, 1, 1, 8)

so the vector of means should look like this: (4, 1, 8)

Any help would be greatly appreciated!

like image 885
Paul Meinshausen Avatar asked Nov 16 '12 22:11

Paul Meinshausen


2 Answers

You can use aggregate

ID = c(1, 1, 1, 1, 2, 2, 3)
Emotion = c(2, 4, 6, 4, 1, 1, 8)
df <- data.frame(ID, Emotion)


aggregate(.~ID, data=df, mean)
   ID Emotion
1  1       4
2  2       1
3  3       8

sapply could also be useful (this other solution will give you a vector)

sapply(split(df$Emotion, df$ID), mean) 
1 2 3 
4 1 8 

There are a lot of ways to do it including ddply from plyr package, data.table package, other combinations of split and lapply, dcast from reshape2 package. See this question for further solutions.

like image 144
Jilber Urbina Avatar answered Nov 18 '22 09:11

Jilber Urbina


This is precisely the job tapply was designed to do.

tapply(df$ID , df$Emotion, mean) 
like image 44
IRTFM Avatar answered Nov 18 '22 10:11

IRTFM