Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate time difference with previous row of a data.frame by group

Tags:

The problem I am trying to solve is that I have a data frame with a sorted POSIXct variable in it. Each row is categorized and I want to get the time differences between each row for each level and add that data back into a new variable. The reproducible problem is as below. The below function is just for creating sample data with random times for the purpose of this question.

random.time <- function(N, start, end) {   st <- as.POSIXct(start)   en <- as.POSIXct(end)   dt <- as.numeric(difftime(en, st, unit="sec"))   ev <- sort(runif(N, 0, dt))   rt <- st + ev   return(rt) } 

The code for simulating the problem is as below:

set.seed(123) category <- sample(LETTERS[1:5], 20, replace=TRUE) randtime <- random.time(20, '2015/06/01 08:00:00', '2015/06/01 18:00:00') df <- data.frame(category, randtime) 

The expected resulting data frame is as below:

>category randtime timediff (secs) >A  2015-06-01 09:05:00 0 >A  2015-06-01 09:06:30 90 >A  2015-06-01 09:10:00 210 >B  2015-06-01 10:18:58 0 >B  2015-06-01 10:19:58 60 >C  2015-06-01 08:14:00 0 >C  2015-06-01 08:16:30 150 

Each subgroup in the output will have the first row with timediff value of 0 as there is no previous row. I was able to group by category and call the following function to calculate the differences but could not get it to collate the final output for all category groups.

getTimeDiff <- function(x) {   no_rows <- nrow(x)   if(no_rows > 1) {     for(i in 2:no_rows) {       t <- x[i, "randtime"] - x[i-1, "randtime"]     }   } } 

I have been at this for two days now without luck so would greatly appreciate any help. Thanks.

like image 519
Mntester Avatar asked Oct 07 '15 18:10

Mntester


People also ask

How do you find the difference between rows in R?

The data frame indexing methods can be used to calculate the difference of rows by group in R. The 'by' attribute is to specify the column to group the data by. All the rows are retained, while a new column is added in the set of columns, using the column to take to compute the difference of rows by the group.

How do you find the time difference between two rows in pandas?

You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.


1 Answers

Try this:

library(dplyr) df %>%   arrange(category, randtime) %>%   group_by(category) %>%   mutate(diff = randtime - lag(randtime),          diff_secs = as.numeric(diff, units = 'secs'))  #   category            randtime             diff   diff_secs #     (fctr)              (time)           (dfft)       (dbl) # 1        A 2015-06-01 11:10:54         NA hours          NA # 2        A 2015-06-01 15:35:04   4.402785 hours   15850.027 # 3        A 2015-06-01 17:01:22   1.438395 hours    5178.222 # 4        B 2015-06-01 08:14:46         NA hours          NA # 5        B 2015-06-01 16:53:43 518.955379 hours 1868239.364 # 6        B 2015-06-01 17:37:48  44.090950 hours  158727.420 

You may also want to add replace(is.na(.), 0) to the chain.

like image 121
JasonAizkalns Avatar answered Oct 09 '22 01:10

JasonAizkalns