Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply diff() only on consecutive days

Tags:

date

r

diff

I have the following data and I would like to apply the function diff() only on consecutive days: diff(data$ch, differences = 1, lag = 1) returns the differences between all consecutive values of ch (23-12, 4-23, 78-4, 120-78, 94-120, ...). I would like the diff() function to return NA when the dates are not consecutive. The output I am trying to obtain from the data below is:

11, -19, 74, NA, -26, NA, -34, 39, NA

Is there anyone who knows how I can do that?

Date        ch
2013-01-01  12
2013-01-02  23
2013-01-03  4
2013-01-04  78
2013-01-10  120
2013-01-11  94
2013-02-26  36
2013-02-27  2
2013-02-28  41
2003-03-05  22
like image 998
stem Avatar asked Aug 03 '15 13:08

stem


3 Answers

You can do these in base R without installing any external packages.

Assuming that the 'Date' column is of Date class, we take the diff of the 'Date' and based on whether the difference between adjacent elements are greater than 1 or not, we can create a grouping index ('indx') by taking the cumulative sum (cumsum) of the logical vector.

 indx <- cumsum(c(TRUE,abs(diff(df1$Date))>1))

In the second step, we can use ave with 'indx' as the grouping vector, and take the diff of 'ch'. The length of output of diff will be 1 less than the length of the 'ch' column. So we can append NA to make the lengths same.

 ave(df1$ch, indx, FUN=function(x) c(diff(x),NA))
 #[1]  11 -19  74  NA -26  NA -34  39  NA  NA

data

df1 <- structure(list(Date = structure(c(15706, 15707, 15708, 15709, 
15715, 15716, 15762, 15763, 15764, 12116), class = "Date"), ch = c(12L, 
23L, 4L, 78L, 120L, 94L, 36L, 2L, 41L, 22L)), .Names = c("Date", 
"ch"), row.names = c(NA, -10L), class = "data.frame")
like image 157
akrun Avatar answered Sep 17 '22 19:09

akrun


The following just "...returns NA when the dates are not consecutive", unless there are tricky cases that it won't account for:

replace(diff(df1$ch), abs(diff(df1$Date)) > 1, NA)
#[1]  11 -19  74  NA -26  NA -34  39  NA
like image 33
alexis_laz Avatar answered Sep 18 '22 19:09

alexis_laz


Try this with the libraries lubridate and dplyr

If you don't have them do this once install.packages("dplyr");install.packages("lubridate")

Code

library(lubridate)
library(dplyr)

data$Date <- ymd(data$Date)
data2 <- data %>% mutate(diff=ifelse(Date==lag(Date)+days(1), ch-lag(ch), NA))

Data

data <- 
  data.frame(Date=c("2013-01-01", "2013-01-02", "2013-01-03", "2013-01-04", "2013-01-10", 
                    "2013-01-11", "2013-01-26", "2013-01-27", "2013-01-28", "2013-03-05"),
               ch=c(12, 23, 4, 78, 120, 94, 36, 2, 41, 22))
like image 42
dimitris_ps Avatar answered Sep 19 '22 19:09

dimitris_ps