Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lag / lead by group in R and dplyr

Tags:

r

dplyr

I'm having trouble trying to lag date grouped by team.

Data:

 df <- data.frame(Team = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D"),
             Date = c("2016-05-10","2016-05-10", "2016-05-10", "2016-05-10",
                      "2016-05-12", "2016-05-12", "2016-05-12",
                      "2016-05-15","2016-05-15",
                      "2016-05-30", "2016-05-30"), 
             Points = c(1,4,3,2,1,5,6,1,2,3,9)
             )

Team      Date       Points
 A     2016-05-10      1
 A     2016-05-10      4
 A     2016-05-10      3
 A     2016-05-10      2
 B     2016-05-12      1
 B     2016-05-12      5
 B     2016-05-12      6
 C     2016-05-15      1
 C     2016-05-15      2
 D     2016-05-30      3
 D     2016-05-30      9

Expected result:

Team      Date       Points   Date_Lagged
 A     2016-05-10      1          NA
 A     2016-05-10      4          NA
 A     2016-05-10      3          NA
 A     2016-05-10      2          NA
 B     2016-05-12      1      2016-05-10 
 B     2016-05-12      5      2016-05-10 
 B     2016-05-12      6      2016-05-10 
 C     2016-05-15      1      2016-05-12
 C     2016-05-15      2      2016-05-12
 D     2016-05-30      3      2016-05-15
 D     2016-05-30      9      2016-05-15

I'm scratching my head after I realised the following isn't the correct solution:

df %>% group_by(Date) %>% mutate(Date_lagged = lag(Date))  

Any idea how to fix it?

like image 756
ant Avatar asked Jun 23 '16 14:06

ant


Video Answer


2 Answers

The lag by default offsets with n=1. However, we have duplicate elements for 'Team', and 'Date'. Inorder to get the expected output, we need to get the distinct rows of 'Team', 'Date', create a 'Date_lagged' with the lag of 'Date' and right_join (or left_join) with the original dataset.

distinct(df, Team, Date) %>%
        mutate(Date_Lagged = lag(Date)) %>%
        right_join(., df) %>%
        select(Team, Date, Points, Date_Lagged)
#   Team       Date Points Date_Lagged
#1     A 2016-05-10      1        <NA>
#2     A 2016-05-10      4        <NA>
#3     A 2016-05-10      3        <NA>
#4     A 2016-05-10      2        <NA>
#5     B 2016-05-12      1  2016-05-10
#6     B 2016-05-12      5  2016-05-10
#7     B 2016-05-12      6  2016-05-10
#8     C 2016-05-15      1  2016-05-12
#9     C 2016-05-15      2  2016-05-12
#10    D 2016-05-30      3  2016-05-15
#11    D 2016-05-30      9  2016-05-15

Or we can also do

df %>% 
    mutate(Date_Lagged = rep(lag(unique(Date)), table(Date)))
like image 82
akrun Avatar answered Sep 21 '22 18:09

akrun


You can do this with base R too, for example using rle:

with(rle(as.character(df$Date)), rep(c(NA, head(values, -1)), lengths))
# [1] NA           NA           NA           NA           "2016-05-10" "2016-05-10"
# [7] "2016-05-10" "2016-05-12" "2016-05-12" "2016-05-15" "2016-05-15"
like image 35
talat Avatar answered Sep 24 '22 18:09

talat