Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding time differences by id R

Tags:

r

I have the following data(df)

Id        Timestamp                 Event
 1    2015-11-06 06:11:43           mail subscribed
 1    2015-11-06 06:15:43           Invoice created
 1    2015-11-06 09:15:43           phone call
 2    2015-11-07 08:15:43           New subscription
 2    2015-11-07 08:20:43           Added to customer list.

I am looking for the following,(Time difference for every id)

For example, Id=1 has three different events with different timings, I want to calculate the differences between the individual times between events based on Id.

Id        Timestamp                 Event                   Time Difference(Mins)
 1    2015-11-06 06:11:43           mail subscribed           0.0
 1    2015-11-06 06:15:43           Invoice created           5.0         
 1    2015-11-06 09:15:43           phone call                180.0
 2    2015-11-07 08:15:43           New subscription          0.0
 2    2015-11-07 08:20:43           Added to customer list    5.0

I tried the following code,

 diff = function(x) as.numeric(x - lag(x) )
 or diff = function (x) as.numeric(0L,diff(x))
 setDT(df)[, diff2 := diff(timestamp), by = Id]

But this code outputs irregular results. Any help?

like image 987
Knight Avatar asked Feb 09 '23 11:02

Knight


2 Answers

Try ave. No packages are used.

transform(df, Diff = ave(as.numeric(Timestamp), Id, FUN = function(x) c(0, diff(x))/60))

giving:

  Id           Timestamp                            Event Diff
1  1 2015-11-06 06:11:43                  mail subscribed    0
2  1 2015-11-06 06:15:43                  Invoice created    4
3  1 2015-11-06 09:15:43                       phone call  180
4  2 2015-11-07 08:15:43                 New subscription    0
5  2 2015-11-07 08:20:43           Added to customer list    5

Note: This was used for the input data.frame, DF

Lines <- "Id,        Timestamp,                 Event
 1,    2015-11-06 06:11:43,           mail subscribed
 1,    2015-11-06 06:15:43,          Invoice created
 1,    2015-11-06 09:15:43,          phone call
 2,    2015-11-07 08:15:43,          New subscription
 2,    2015-11-07 08:20:43,          Added to customer list"

df <- read.csv(text = Lines)
df$Timestamp <- as.POSIXct(df$Timestamp)

UPDATED as per comment.

like image 149
G. Grothendieck Avatar answered Feb 16 '23 03:02

G. Grothendieck


You can do it with the package data.table:

library(data.table)
setDT(df)[, Diff := difftime(Timestamp, Timestamp[1], units="mins"), by=Id]

df
#   Id           Timestamp                   Event     Diff
#1:  1 2015-11-06 06:11:43         mail subscribed   0 mins
#2:  1 2015-11-06 06:15:43         Invoice created   4 mins
#3:  1 2015-11-06 09:15:43              phone call 184 mins
#4:  2 2015-11-07 08:15:43        New subscription   0 mins
#5:  2 2015-11-07 08:20:43 Added to customer list.   5 mins

Edit

As per @Jaap comment, if what you need is the consecutive differences, you can do:

df[, Diff2 := difftime(Timestamp, shift(Timestamp, 1L), units = "mins"), by = Id
   ][is.na(Diff2), Diff2:=0]

df
#   Id           Timestamp                   Event     Diff    Diff2
#1:  1 2015-11-06 06:11:43         mail subscribed   0 mins   0 mins
#2:  1 2015-11-06 06:15:43         Invoice created   4 mins   4 mins
#3:  1 2015-11-06 09:15:43              phone call 184 mins 180 mins
#4:  2 2015-11-07 08:15:43        New subscription   0 mins   0 mins
#5:  2 2015-11-07 08:20:43 Added to customer list.   5 mins   5 mins
like image 39
Cath Avatar answered Feb 16 '23 04:02

Cath