I want to find the lead() and lag() element in each group, but had some wrong results.
For example, data is like this:
library(dplyr) df = data.frame(name=rep(c('Al','Jen'),3), score=rep(c(100, 80, 60),2)) df
Data:
name score 1 Al 100 2 Jen 80 3 Al 60 4 Jen 100 5 Al 80 6 Jen 60
Now I try to find out lead() and lag() scores for each person. If I sort it using arrange(), I can get the correct answer:
df %>% arrange(name) %>% group_by(name) %>% mutate(next.score = lead(score), before.score = lag(score) )
OUTPUT1:
Source: local data frame [6 x 4] Groups: name name score next.score before.score 1 Al 100 60 NA 2 Al 60 80 100 3 Al 80 NA 60 4 Jen 80 100 NA 5 Jen 100 60 80 6 Jen 60 NA 100
Without arrange(), the result is wrong:
df %>% group_by(name) %>% mutate(next.score = lead(score), before.score = lag(score) )
OUTPUT2:
Source: local data frame [6 x 4] Groups: name name score next.score before.score 1 Al 100 80 NA 2 Jen 80 60 NA 3 Al 60 100 80 4 Jen 100 80 60 5 Al 80 NA 100 6 Jen 60 NA 80
E.g., in 1st line, Al's next.score should be 60 (3rd line).
Anybody know why this happened? Why arrange() affects the result (the values, not just about the order)? Thanks~
lag lag shifts the times one back. It does not change the values, only the times. Thus lag changes the tsp attribute from c(1, 4, 1) to c(0, 3, 1) . The start time is shifted from 1 to 0, the end time is shifted from 4 to 3 and since shifts do not change the frequency the frequency remains 1.
The opposite of lag() function is lead()
It seems you have to pass additional argument to lag and lead functions. When I run your function without arrange, but with order_by added, everything seems to be ok.
df %>% group_by(name) %>% mutate(next.score = lead(score, order_by=name), before.score = lag(score, order_by=name))
Output:
name score next.score before.score 1 Al 100 60 NA 2 Jen 80 100 NA 3 Al 60 80 100 4 Jen 100 60 80 5 Al 80 NA 60 6 Jen 60 NA 100
My sessionInfo():
R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250 [4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dplyr_0.4.1 loaded via a namespace (and not attached): [1] assertthat_0.1 DBI_0.3.1 lazyeval_0.1.10 magrittr_1.5 parallel_3.1.1 Rcpp_0.11.5 [7] tools_3.1.1
It may happen that stats::lag
is used instead (e.g. when restoring environments with the session
package). This can easly slip through unnoticed as it it won't throw an error when being used as in the question. Double-check by simply typing lag
, use conflicted
package, or disambiguate the function call by calling dplyr::lag
instead.
The same could happen for plyr::mutate
, in case you might have loaded plyr
package in your session. So make sure you're also calling dplyr::mutate
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With