Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time difference between rows in R dplyr, different units

Here is my example. I am reading the following file: sample_data

library(dplyr)

txt <- c('"",  "MDN",                  "Cl_Date"',
          '"1",  "A",  "2017-04-15 15:10:42.510"',
          '"2",  "A",  "2017-04-01 14:47:23.210"',
          '"3",  "A",  "2017-04-01 14:49:54.063"',
          '"4",  "B",  "2017-04-30 13:25:00.000"',
          '"5",  "B",  "2017-04-03 17:53:13.217"',
          '"6",  "B",  "2017-04-15 15:17:43.780"')

ts <- read.csv(text = txt, as.is = TRUE)
ts$Cl_Date <- as.POSIXct(ts$Cl_Date)
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff = c(0,diff(Cl_Date)))
ts <-ts[order(ts$MDN, ts$Cl_Date),]

As a result I have

MDN Cl_Date         time_diff
A   4/1/2017 14:47  0
A   4/1/2017 14:49  2.514216665
A   4/15/2017 15:10 20180.80745
B   4/3/2017 17:53  0
B   4/15/2017 15:17 11.89202041
B   4/30/2017 13:25 14.92171551

So I group by MDN column and compute difference between Cl_Date column. As you can see sometime different in minutes (group A) and sometime difference in days (group B).

Why is time difference in different units and how to correct it?

P.S. I could not reproduce the same example with manual data.frame creation, so I had to read from file.

UPDATE 1 diff(ts$Cl_Date) seems to be consistent, everything is in minutes. Does something break within dplyr?

UPDATE 2

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = Cl_Date-lag(Cl_Date))

produces the same result.

like image 554
user1700890 Avatar asked Jun 05 '17 21:06

user1700890


2 Answers

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = as.numeric(Cl_Date-lag(Cl_Date), units = 'mins'))

Convert the time difference to a numeric value. You can use units argument to make the return values consistent.

like image 182
troh Avatar answered Nov 04 '22 19:11

troh


According to @hadley here, the solution is to use lubridate instead of relying on base R.

This would be something like:

ts %>% 
  group_by(MDN) %>% 
  arrange(Cl_Date) %>%
  mutate(as.duration(Cl_Date %--% lag(Cl_Date)))
like image 36
jmw Avatar answered Nov 04 '22 20:11

jmw