Here is a small reproducible example of my data:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
I would like to generate a new variable containing the mean of measure
for that particular subject, so:
subject time measure mn_measure
1 0 10 10
1 1 12 10
1 2 8 10
2 0 7 2.333
2 1 0 2.333
2 2 0 2.333
Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?
Use the base R function ave()
, which despite its confusing name, can calculate a variety of statistics, including the mean
:
within(mydata, mean<-ave(measure, subject, FUN=mean))
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
Note that I use within
just for the sake of shorter code. Here is the equivalent without within()
:
mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
Alternatively with data.table
package:
require(data.table)
dt <- data.table(mydata, key = "subject")
dt[, mn_measure := mean(measure), by = subject]
# subject time measure mn_measure
# 1: 1 0 10 10.000000
# 2: 1 1 12 10.000000
# 3: 1 2 8 10.000000
# 4: 2 0 7 2.333333
# 5: 2 1 0 2.333333
# 6: 2 2 0 2.333333
You can use ddply
from the plyr
package:
library(plyr)
res = ddply(mydata, .(subject), mutate, mn_measure = mean(measure))
res
subject time measure mn_measure
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With