Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing differences between consecutive values or with the latest non-NA value in a vector in R

I am looking for a R function to compute differences between consecutive or with the latest non-NA value in a vector. Here's an example:

visit <- c(1,2,3,4)
time <- c(5,10,NA,15)
df <- data.frame(visit ,time)

We are looking for the time since the last visit.

Using diff, we get a length 3 vector:

diff <- diff(df$time, lag = 1, differences = 1)

5 NA NA

The wanted 'diff' vector is:

 5 NA 5

And ideally it would be the same length that the original vector 'value' so it could be added to the dataframe 'df':

  visit | time | diff
    1      5       NA
    2      10      5
    3      NA      NA
    4      15      5
like image 481
dambach Avatar asked Feb 21 '17 16:02

dambach


2 Answers

Here's one way, using only basic R operations:

First work out the non-NA diffs by chopping the NAs out:

> cdiffs = diff(df$time[!is.na(df$time)])

Then work out where they are going to go in the result column. It will be all the non-NA places except the first place which is NA because of the lag:

> cplace = which(!is.na(df$time))[-1]

Now create a column of NAs and fill the diffs into the right places:

> df$diffs = NA
> df$diffs[cplace] = cdiffs
> df
  visit time diffs
1     1    5    NA
2     2   10     5
3     3   NA    NA
4     4   15     5
like image 97
Spacedman Avatar answered Sep 30 '22 09:09

Spacedman


With lag and na.locf functions you could do the following:

lag provides access to previous value and na.locf stands for last observation carried forward in presence of missing value

library(zoo)     #for na.locf function
library(dplyr)   #for lag function, (had issues with base lag function)

DF$newDiff = DF$time - na.locf(lag(DF$time),na.rm = FALSE)

DF
#  visit time newDiff
#1     1    5      NA
#2     2   10       5
#3     3   NA      NA
#4     4   15       5
like image 24
Silence Dogood Avatar answered Sep 30 '22 10:09

Silence Dogood