I am looking for a R function to compute differences between consecutive or with the latest non-NA value in a vector. Here's an example:
visit <- c(1,2,3,4)
time <- c(5,10,NA,15)
df <- data.frame(visit ,time)
We are looking for the time since the last visit.
Using diff, we get a length 3 vector:
diff <- diff(df$time, lag = 1, differences = 1)
5 NA NA
The wanted 'diff' vector is:
5 NA 5
And ideally it would be the same length that the original vector 'value' so it could be added to the dataframe 'df':
visit | time | diff
1 5 NA
2 10 5
3 NA NA
4 15 5
Here's one way, using only basic R operations:
First work out the non-NA diffs by chopping the NAs out:
> cdiffs = diff(df$time[!is.na(df$time)])
Then work out where they are going to go in the result column. It will be all the non-NA places except the first place which is NA because of the lag:
> cplace = which(!is.na(df$time))[-1]
Now create a column of NAs and fill the diffs into the right places:
> df$diffs = NA
> df$diffs[cplace] = cdiffs
> df
visit time diffs
1 1 5 NA
2 2 10 5
3 3 NA NA
4 4 15 5
With lag
and na.locf
functions you could do the following:
lag
provides access to previous value and na.locf
stands for last observation carried forward in presence of missing value
library(zoo) #for na.locf function
library(dplyr) #for lag function, (had issues with base lag function)
DF$newDiff = DF$time - na.locf(lag(DF$time),na.rm = FALSE)
DF
# visit time newDiff
#1 1 5 NA
#2 2 10 5
#3 3 NA NA
#4 4 15 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With