Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating time spent in a given level when that level changes

This is what my dataframe looks like.The rightmost column("TimeForLevelChange") is my desired column. When a level changes for a given Name, I want to take the minimum date from the prior level and take the date from the row where the level change happens and calculate the difference. So on third row, John's level changed from 1 to 2 and he spent 16 days(2016-01-17 - 2016-01-01) in level 1 before changing to level 2.

  library(data.table)
         dt <- fread('
            Name      Level     Date         RecentLevelChange TimeForLevelChange  
            John       1       2016-01-01       NA                 NA
            John       1       2016-01-10       NA                 NA
            John       2       2016-01-17       1->2               16
            John       2       2016-01-18       NA                 NA
            John       3       2016-01-22       2->3               5
            John       4       2016-01-26       3->4               4
            John       4       2016-01-27       NA                 NA
            John       7       2016-01-29       4->7               3
            Tom        1       2016-01-10       NA                NA
            Tom        2       2016-01-17       1->2              7
            Tom        2       2016-01-18       NA                NA
            Tom        3       2016-01-22       2->3              5
            Tom        4       2016-01-26       3->4              4
            Tom        4       2016-01-27       NA                NA
            Tom        7       2016-01-29       4->7              3
      ')
dt[, Date := as.IDate(Date)]

I can use shift function in data.table but I dont know how to define the minimum date from the prior level for a given name.

like image 542
gibbz00 Avatar asked Apr 28 '16 17:04

gibbz00


1 Answers

I might do

spell = dt[,{.(
    w    = .I[1L],
    Date = Date[1L]
)}, by=.(Name, rleid(Level))][, .(
  w = tail(w,-1), 
  d = diff(Date)
), by=Name]

dt[spell$w, dur_lastspell := spell$d]

which gives

    Name Level       Date RecentLevelChange TimeForLevelChange dur_lastspell
 1: John     1 2016-01-01                NA                 NA       NA days
 2: John     1 2016-01-10                NA                 NA       NA days
 3: John     2 2016-01-17              1->2                 16       16 days
 4: John     2 2016-01-18                NA                 NA       NA days
 5: John     3 2016-01-22              2->3                  5        5 days
 6: John     4 2016-01-26              3->4                  4        4 days
 7: John     4 2016-01-27                NA                 NA       NA days
 8: John     7 2016-01-29              4->7                  3        3 days
 9:  Tom     1 2016-01-10                NA                 NA       NA days
10:  Tom     2 2016-01-17              1->2                  7        7 days
11:  Tom     2 2016-01-18                NA                 NA       NA days
12:  Tom     3 2016-01-22              2->3                  5        5 days
13:  Tom     4 2016-01-26              3->4                  4        4 days
14:  Tom     4 2016-01-27                NA                 NA       NA days
15:  Tom     7 2016-01-29              4->7                  3        3 days

I use {.()} instead of .() because the latter gives an error. I'll report it as a bug.

like image 152
Frank Avatar answered Oct 11 '22 06:10

Frank