I want to create multiple lags of multiple variables, so I thought writing a function would be helpful. My code throws a warning ("Truncating vector to length 1 ") and false results:
library(dplyr)
time <- c(2000:2009, 2000:2009)
x <- c(1:10, 10:19)
id <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
df <- data.frame(id, time, x)
three_lags <- function (data, column, group, ordervar) {
data <- data %>%
group_by_(group) %>%
mutate(a = lag(column, 1L, NA, order_by = ordervar),
b = lag(column, 2L, NA, order_by = ordervar),
c = lag(column, 3L, NA, order_by = ordervar))
}
df_lags <- three_lags(data=df, column=x, group=id, ordervar=time) %>%
arrange(id, time)
Also I wondered if there might be a more elegant solution using mutate_each
, but I didn't get that to work either. I can of course just write a long code with a line for each new lagged variable, but Id like to avoid that.
EDIT:
akrun's dplyr answer works, but takes a long time to compute for large data frames. The solution using data.table
seems to be more efficient. So a dplyr or other solution that also allows the be implemented for several columns & several lags is still to be found.
EDIT 2:
For multiple columns and no groups (e.g. "ID") the following solution seems very well suited to me, due to its simplicity. The code may of course be shortened, but step by step:
df <- arrange(df, time)
df.lag <- shift(df[,1:24], n=1:3, give.names = T) ##column indexes of columns to be lagged as "[,startcol:endcol]", "n=1:3" sepcifies the number of lags (lag1, lag2 and lag3 in this case)
df.result <- bind_cols(df, df.lag)
We can use shift
from data.table
which can take multiple values for 'n'
library(data.table)
setDT(df)[order(time), c("a", "b", "c") := shift(x, 1:3) , id][order(id, time)]
Suppose, we need to do this on multiple columns
df$y <- df$x
setDT(df)[order(time), paste0(rep(c("x", "y"), each =3),
c("a", "b", "c")) :=shift(.SD, 1:3), id, .SDcols = x:y]
The shift
can also be used in the dplyr
library(dplyr)
df %>%
group_by(id) %>%
arrange(id, time) %>%
do(data.frame(., setNames(shift(.$x, 1:3), c("a", "b", "c"))))
# id time x a b c
# <dbl> <int> <int> <int> <int> <int>
#1 1 2000 1 NA NA NA
#2 1 2001 2 1 NA NA
#3 1 2002 3 2 1 NA
#4 1 2003 4 3 2 1
#5 1 2004 5 4 3 2
#6 1 2005 6 5 4 3
#7 1 2006 7 6 5 4
#8 1 2007 8 7 6 5
#9 1 2008 9 8 7 6
#10 1 2009 10 9 8 7
#11 2 2000 10 NA NA NA
#12 2 2001 11 10 NA NA
#13 2 2002 12 11 10 NA
#14 2 2003 13 12 11 10
#15 2 2004 14 13 12 11
#16 2 2005 15 14 13 12
#17 2 2006 16 15 14 13
#18 2 2007 17 16 15 14
#19 2 2008 18 17 16 15
#20 2 2009 19 18 17 16
Could also create a function that will output a tibble:
library(tidyverse)
lag_multiple <- function(x, n_vec){
map(n_vec, lag, x = x) %>%
set_names(paste0("lag", n_vec)) %>%
as_tibble()
}
tibble(x = 1:30) %>%
mutate(lag_multiple(x, 1:5))
#> # A tibble: 30 x 6
#> x lag1 lag2 lag3 lag4 lag5
#> <int> <int> <int> <int> <int> <int>
#> 1 1 NA NA NA NA NA
#> 2 2 1 NA NA NA NA
#> 3 3 2 1 NA NA NA
#> 4 4 3 2 1 NA NA
#> 5 5 4 3 2 1 NA
#> 6 6 5 4 3 2 1
#> 7 7 6 5 4 3 2
#> 8 8 7 6 5 4 3
#> 9 9 8 7 6 5 4
#> 10 10 9 8 7 6 5
#> # ... with 20 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With