This is what my dataframe looks like.The rightmost column("TimeForLevelChange") is my desired column. When a level changes for a given Name, I want to take the minimum date from the prior level and take the date from the row where the level change happens and calculate the difference. So on third row, John's level changed from 1 to 2 and he spent 16 days(2016-01-17 - 2016-01-01) in level 1 before changing to level 2.
library(data.table)
dt <- fread('
Name Level Date RecentLevelChange TimeForLevelChange
John 1 2016-01-01 NA NA
John 1 2016-01-10 NA NA
John 2 2016-01-17 1->2 16
John 2 2016-01-18 NA NA
John 3 2016-01-22 2->3 5
John 4 2016-01-26 3->4 4
John 4 2016-01-27 NA NA
John 7 2016-01-29 4->7 3
Tom 1 2016-01-10 NA NA
Tom 2 2016-01-17 1->2 7
Tom 2 2016-01-18 NA NA
Tom 3 2016-01-22 2->3 5
Tom 4 2016-01-26 3->4 4
Tom 4 2016-01-27 NA NA
Tom 7 2016-01-29 4->7 3
')
dt[, Date := as.IDate(Date)]
I can use shift function in data.table but I dont know how to define the minimum date from the prior level for a given name.
I might do
spell = dt[,{.(
w = .I[1L],
Date = Date[1L]
)}, by=.(Name, rleid(Level))][, .(
w = tail(w,-1),
d = diff(Date)
), by=Name]
dt[spell$w, dur_lastspell := spell$d]
which gives
Name Level Date RecentLevelChange TimeForLevelChange dur_lastspell
1: John 1 2016-01-01 NA NA NA days
2: John 1 2016-01-10 NA NA NA days
3: John 2 2016-01-17 1->2 16 16 days
4: John 2 2016-01-18 NA NA NA days
5: John 3 2016-01-22 2->3 5 5 days
6: John 4 2016-01-26 3->4 4 4 days
7: John 4 2016-01-27 NA NA NA days
8: John 7 2016-01-29 4->7 3 3 days
9: Tom 1 2016-01-10 NA NA NA days
10: Tom 2 2016-01-17 1->2 7 7 days
11: Tom 2 2016-01-18 NA NA NA days
12: Tom 3 2016-01-22 2->3 5 5 days
13: Tom 4 2016-01-26 3->4 4 4 days
14: Tom 4 2016-01-27 NA NA NA days
15: Tom 7 2016-01-29 4->7 3 3 days
I use {.()}
instead of .()
because the latter gives an error. I'll report it as a bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With