I have a data.frame with exactly one value measured for each subject at multiple timepoints. It simplifies to this:
> set.seed(42)
> x = data.frame(subject=rep(c('a', 'b', 'c'), 3), time=rep(c(1,2,3), each=3), value=rnorm(3*3, 0, 1))
> x
subject time value
1 a 1 1.37095845
2 b 1 -0.56469817
3 c 1 0.36312841
4 a 2 0.63286260
5 b 2 0.40426832
6 c 2 -0.10612452
7 a 3 1.51152200
8 b 3 -0.09465904
9 c 3 2.01842371
I want to calculate the change in value
for each timepoint and for each subject. For this simple example, my My current solution is this:
> x$diff[x$time==1] = x$value[x$time==2] - x$value[x$time==1]
> x$diff[x$time==2] = x$value[x$time==3] - x$value[x$time==2]
> x
subject time value diff
1 a 1 1.37095845 -0.7380958
2 b 1 -0.56469817 0.9689665
3 c 1 0.36312841 -0.4692529
4 a 2 0.63286260 0.8786594
5 b 2 0.40426832 -0.4989274
6 c 2 -0.10612452 2.1245482
7 a 3 1.51152200 NA
8 b 3 -0.09465904 NA
9 c 3 2.01842371 NA
... and then remove the last rows. However, in my actual data set, there's way more levels of time
and I need to do this for several columns instead of just value
. The code gets very ugly. Is there a neat way to do this? A solution which does not assume that rows are ordered within subjects according to time
would be nice.
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(x)
), grouped by 'subject', we take the difference of the next value (shift(value, type='lead')
) with the current value and assign (:=
) the output to create the 'Diff' column.
library(data.table)#v1.9.6+
setDT(x)[order(time),Diff := shift(value, type= 'lead') - value ,
by = subject]
# subject time value Diff
#1: a 1 1.37095845 -0.7380958
#2: b 1 -0.56469817 0.9689665
#3: c 1 0.36312841 -0.4692529
#4: a 2 0.63286260 0.8786594
#5: b 2 0.40426832 -0.4989274
#6: c 2 -0.10612452 2.1245482
#7: a 3 1.51152200 NA
#8: b 3 -0.09465904 NA
#9: c 3 2.01842371 NA
You can use dplyr
for this:
library(dplyr)
x %>%
arrange(time, subject) %>%
group_by(subject) %>%
mutate(diff = c(diff(value), NA))
# Source: local data frame [9 x 4]
# Groups: subject [3]
#
# subject time value diff
# (fctr) (dbl) (dbl) (dbl)
# 1 a 1 1.30970525 -1.66596287
# 2 b 1 0.12556761 -0.06070412
# 3 c 1 -1.09423634 1.38590546
# 4 a 2 -0.35625763 0.91417329
# 5 b 2 0.06486349 0.06652424
# 6 c 2 0.29166912 -0.98495562
# 7 a 3 0.55791566 NA
# 8 b 3 0.13138773 NA
# 9 c 3 -0.69328649 NA
If you want to get rid of the NA
s, add %>% na.omit
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With