Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate differences on a variable between factor levels

Tags:

dataframe

r

I have a data.frame with exactly one value measured for each subject at multiple timepoints. It simplifies to this:

> set.seed(42)
> x = data.frame(subject=rep(c('a', 'b', 'c'), 3), time=rep(c(1,2,3), each=3), value=rnorm(3*3, 0, 1))
> x
  subject time       value
1       a    1  1.37095845
2       b    1 -0.56469817
3       c    1  0.36312841
4       a    2  0.63286260
5       b    2  0.40426832
6       c    2 -0.10612452
7       a    3  1.51152200
8       b    3 -0.09465904
9       c    3  2.01842371

I want to calculate the change in value for each timepoint and for each subject. For this simple example, my My current solution is this:

> x$diff[x$time==1] = x$value[x$time==2] - x$value[x$time==1]
> x$diff[x$time==2] = x$value[x$time==3] - x$value[x$time==2]
> x
  subject time       value       diff
1       a    1  1.37095845 -0.7380958
2       b    1 -0.56469817  0.9689665
3       c    1  0.36312841 -0.4692529
4       a    2  0.63286260  0.8786594
5       b    2  0.40426832 -0.4989274
6       c    2 -0.10612452  2.1245482
7       a    3  1.51152200         NA
8       b    3 -0.09465904         NA
9       c    3  2.01842371         NA

... and then remove the last rows. However, in my actual data set, there's way more levels of time and I need to do this for several columns instead of just value. The code gets very ugly. Is there a neat way to do this? A solution which does not assume that rows are ordered within subjects according to time would be nice.

like image 631
Jonas Lindeløv Avatar asked Dec 19 '22 22:12

Jonas Lindeløv


2 Answers

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(x)), grouped by 'subject', we take the difference of the next value (shift(value, type='lead')) with the current value and assign (:=) the output to create the 'Diff' column.

library(data.table)#v1.9.6+
setDT(x)[order(time),Diff := shift(value, type= 'lead') - value ,
                                              by = subject]
#   subject time       value       Diff
#1:       a    1  1.37095845 -0.7380958
#2:       b    1 -0.56469817  0.9689665
#3:       c    1  0.36312841 -0.4692529
#4:       a    2  0.63286260  0.8786594
#5:       b    2  0.40426832 -0.4989274
#6:       c    2 -0.10612452  2.1245482
#7:       a    3  1.51152200         NA
#8:       b    3 -0.09465904         NA
#9:       c    3  2.01842371         NA
like image 151
akrun Avatar answered Dec 21 '22 12:12

akrun


You can use dplyr for this:

library(dplyr)
x %>% 
  arrange(time, subject) %>% 
  group_by(subject) %>% 
  mutate(diff = c(diff(value), NA))
# Source: local data frame [9 x 4]
# Groups: subject [3]
# 
#   subject  time       value        diff
#    (fctr) (dbl)       (dbl)       (dbl)
# 1       a     1  1.30970525 -1.66596287
# 2       b     1  0.12556761 -0.06070412
# 3       c     1 -1.09423634  1.38590546
# 4       a     2 -0.35625763  0.91417329
# 5       b     2  0.06486349  0.06652424
# 6       c     2  0.29166912 -0.98495562
# 7       a     3  0.55791566          NA
# 8       b     3  0.13138773          NA
# 9       c     3 -0.69328649          NA

If you want to get rid of the NAs, add %>% na.omit.

like image 41
lukeA Avatar answered Dec 21 '22 11:12

lukeA