Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate difference from initial value for each group in R?

Tags:

r

I have data arranged like this in R:

indv    time    val
A          6    5
A         10    10
A         12    7
B          8    4
B         10    3
B         15    9

For each individual (indv) at each time, I want to calculate the change in value (val) from the initial time. So I would end up with something like this:

indv time   val val_1   val_change
A       6     5    5       0
A      10    10    5       5
A      12     7    5       2
B       8     4    4       0
B      10     3    4      -1
B      15     9    4       5

Can anyone tell me how I might do this? I can use

ddply(df, .(indv), function(x)x[which.min(x$time), ])

to get a table like

indv    time    val
A          6    5   
B          8    4   

However, I cannot figure out how to make a column val_1 where the minimum values are matched up for each individual. However, if I can do that, I should be able to add column val_change using something like:

df['val_change'] = df['val_1'] - df['val']

EDIT: two excellent methods were posted below, however both rely on my time column being sorted so that small time values are on top of high time values. I'm not sure this will always be the case with my data. (I know I can sort first in Excel, but I'm trying to avoid that.) How could I deal with a case when the table appears like this:

indv    time    value
A          10   10
A           6   5
A          12   7
B           8   4
B          10   3
B          15   9
like image 907
Thomas Avatar asked Nov 14 '12 21:11

Thomas


People also ask

How do I find the difference between rows in R?

diff() method in base R is used to find the difference among all the pairs of consecutive rows in the R dataframe. It returns a vector with the length equivalent to the length of the input column – 1.

How do you show difference in R?

diff() function in R Language is used to find the difference between each consecutive pair of elements of a vector.


2 Answers

Here is a data.table solution that will be memory efficient as it is setting by reference within the data.table. Setting the key will sort by the key variables

library(data.table)
DT <- data.table(df)  
# set key to sort by indv then time
setkey(DT, indv, time)
DT[, c('val1','change') := list(val[1], val - val[1]),by = indv]
# And to show it works....
DT
##    indv time val val1 change
## 1:    A    6   5    5      0
## 2:    A   10  10    5      5
## 3:    A   12   7    5      2
## 4:    B    8   4    4      0
## 5:    B   10   3    4     -1
## 6:    B   15   9    4      5
like image 165
mnel Avatar answered Sep 22 '22 01:09

mnel


Here's a plyr solution using ddply

ddply(df, .(indv), transform, 
      val_1 = val[1],
      change = (val - val[1]))

  indv time val val_1 change
1    A    6   5     5      0
2    A   10  10     5      5
3    A   12   7     5      2
4    B    8   4     4      0
5    B   10   3     4     -1
6    B   15   9     4      5

To get your second table try this:

ddply(df, .(indv), function(x) x[which.min(x$time), ])
  indv time val
1    A    6   5
2    B    8   4

Edit 1

To deal with unsorted data, like the one you posted in your edit try the following

unsort <- read.table(text="indv    time    value
A          10   10
A           6   5
A          12   7
B           8   4
B          10   3
B          15   9", header=T)


do.call(rbind, lapply(split(unsort, unsort$indv), 
                  function(x) x[order(x$time), ]))
    indv time value
A.2    A    6     5
A.1    A   10    10
A.3    A   12     7
B.4    B    8     4
B.5    B   10     3
B.6    B   15     9

Now you can apply the procedure described above to this sorted dataframe

Edit 2

A shorter way to sort your dataframe is using sortBy function from doBy package

library(doBy)
orderBy(~ indv + time, unsort)
  indv time value
2    A    6     5
1    A   10    10
3    A   12     7
4    B    8     4
5    B   10     3
6    B   15     9

Edit 3

You can even sort your df using ddply

ddply(unsort, .(indv, time), sort)
  value time indv
1     5    6    A
2    10   10    A
3     7   12    A
4     4    8    B
5     3   10    B
6     9   15    B
like image 23
Jilber Urbina Avatar answered Sep 21 '22 01:09

Jilber Urbina