I have a dataframe containing a set of variables that I want to lag at different lenghts so that I can use them in regressions later on (instead of lagging one variable at a time manually).
I found this code on Stackoverflow that seems to do the trick:
df = data.frame(a = 1:10, b = 21:30)
dplyr::mutate_all(df, lag)
a b
1 NA NA
2 1 21
3 2 22
4 3 23
5 4 24
6 5 25
7 6 26
8 7 27
9 8 28
10 9 29
The problem is that this lags every column and I have some columns that I don't want to be lagged. How do I adapt the above code so that the columns I don't want to be lagged are excluded? And also how do i lag a different lenghts, now it only lags by 1 as the default setting
I keep googling up this same Q&A and then noting that mutate_at()
and mutate_if()
are now superceded by across()
, which provides a slightly easier-to-remember approach for the "mutate all except these columns" pattern
df = data.frame(a = 1:10, b = 21:30, c=31:40, d=41:50)
> df
a b c d
1 1 21 31 41
2 2 22 32 42
3 3 23 33 43
4 4 24 34 44
5 5 25 35 45
6 6 26 36 46
7 7 27 37 47
8 8 28 38 48
9 9 29 39 49
10 10 30 40 50
> # everythng but columns b and c
> df %>% mutate(across(!b & !c, lag))
a b c d
1 NA 21 31 NA
2 1 22 32 41
3 2 23 33 42
4 3 24 34 43
5 4 25 35 44
6 5 26 36 45
7 6 27 37 46
8 7 28 38 47
9 8 29 39 48
10 9 30 40 49
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With