Calculating column means based on values in another column [duplicate]

Question

Possible Duplicate:
R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate vs.

I'm using R and would love some help with a problem I'm having:

I have a dataframe (df) with a column ID and a column Emotion. Each value in ID corresponds with 40-300 values in Emotion (so it's not a set number). I need to calculate the mean of all i's in Emotion for each j in ID. So this is what the data looks like

df$ID = (1, 1, 1, 1, 2, 2, 3)
df$Emotion = (2, 4, 6, 4, 1, 1, 8)

so the vector of means should look like this: (4, 1, 8)

Any help would be greatly appreciated!

Jilber Urbina · Accepted Answer

You can use aggregate

ID = c(1, 1, 1, 1, 2, 2, 3)
Emotion = c(2, 4, 6, 4, 1, 1, 8)
df <- data.frame(ID, Emotion)


aggregate(.~ID, data=df, mean)
   ID Emotion
1  1       4
2  2       1
3  3       8

sapply could also be useful (this other solution will give you a vector)

sapply(split(df$Emotion, df$ID), mean) 
1 2 3 
4 1 8

There are a lot of ways to do it including ddply from plyr package, data.table package, other combinations of split and lapply, dcast from reshape2 package. See this question for further solutions.

IRTFM · Answer

This is precisely the job tapply was designed to do.

tapply(df$ID , df$Emotion, mean)

Calculating column means based on values in another column [duplicate]

Tags:

r

Paul Meinshausen

2 Answers

Jilber Urbina

IRTFM

Recent Activity

Donate For Us

Calculating column means based on values in another column [duplicate]

Tags:

r

Paul Meinshausen

2 Answers

Jilber Urbina

IRTFM

Related questions

Recent Activity

Donate For Us