I have the following data(df)
Id Timestamp Event
1 2015-11-06 06:11:43 mail subscribed
1 2015-11-06 06:15:43 Invoice created
1 2015-11-06 09:15:43 phone call
2 2015-11-07 08:15:43 New subscription
2 2015-11-07 08:20:43 Added to customer list.
I am looking for the following,(Time difference for every id)
For example, Id=1 has three different events with different timings, I want to calculate the differences between the individual times between events based on Id.
Id Timestamp Event Time Difference(Mins)
1 2015-11-06 06:11:43 mail subscribed 0.0
1 2015-11-06 06:15:43 Invoice created 5.0
1 2015-11-06 09:15:43 phone call 180.0
2 2015-11-07 08:15:43 New subscription 0.0
2 2015-11-07 08:20:43 Added to customer list 5.0
I tried the following code,
diff = function(x) as.numeric(x - lag(x) )
or diff = function (x) as.numeric(0L,diff(x))
setDT(df)[, diff2 := diff(timestamp), by = Id]
But this code outputs irregular results. Any help?
Try ave
. No packages are used.
transform(df, Diff = ave(as.numeric(Timestamp), Id, FUN = function(x) c(0, diff(x))/60))
giving:
Id Timestamp Event Diff
1 1 2015-11-06 06:11:43 mail subscribed 0
2 1 2015-11-06 06:15:43 Invoice created 4
3 1 2015-11-06 09:15:43 phone call 180
4 2 2015-11-07 08:15:43 New subscription 0
5 2 2015-11-07 08:20:43 Added to customer list 5
Note: This was used for the input data.frame, DF
Lines <- "Id, Timestamp, Event
1, 2015-11-06 06:11:43, mail subscribed
1, 2015-11-06 06:15:43, Invoice created
1, 2015-11-06 09:15:43, phone call
2, 2015-11-07 08:15:43, New subscription
2, 2015-11-07 08:20:43, Added to customer list"
df <- read.csv(text = Lines)
df$Timestamp <- as.POSIXct(df$Timestamp)
UPDATED as per comment.
You can do it with the package data.table
:
library(data.table)
setDT(df)[, Diff := difftime(Timestamp, Timestamp[1], units="mins"), by=Id]
df
# Id Timestamp Event Diff
#1: 1 2015-11-06 06:11:43 mail subscribed 0 mins
#2: 1 2015-11-06 06:15:43 Invoice created 4 mins
#3: 1 2015-11-06 09:15:43 phone call 184 mins
#4: 2 2015-11-07 08:15:43 New subscription 0 mins
#5: 2 2015-11-07 08:20:43 Added to customer list. 5 mins
Edit
As per @Jaap comment, if what you need is the consecutive differences, you can do:
df[, Diff2 := difftime(Timestamp, shift(Timestamp, 1L), units = "mins"), by = Id
][is.na(Diff2), Diff2:=0]
df
# Id Timestamp Event Diff Diff2
#1: 1 2015-11-06 06:11:43 mail subscribed 0 mins 0 mins
#2: 1 2015-11-06 06:15:43 Invoice created 4 mins 4 mins
#3: 1 2015-11-06 09:15:43 phone call 184 mins 180 mins
#4: 2 2015-11-07 08:15:43 New subscription 0 mins 0 mins
#5: 2 2015-11-07 08:20:43 Added to customer list. 5 mins 5 mins
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With