Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

debugging: function to create multiple lags for multiple columns (dplyr)

Tags:

r

dplyr

I want to create multiple lags of multiple variables, so I thought writing a function would be helpful. My code throws a warning ("Truncating vector to length 1 ") and false results:

library(dplyr)
time <- c(2000:2009, 2000:2009)
x <- c(1:10, 10:19)
id <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
df <- data.frame(id, time, x)



three_lags <- function (data, column, group, ordervar) {
  data <- data %>% 
    group_by_(group) %>%
    mutate(a = lag(column, 1L, NA, order_by = ordervar),
            b = lag(column, 2L, NA, order_by = ordervar),
            c = lag(column, 3L, NA, order_by = ordervar)) 
  }

df_lags <- three_lags(data=df, column=x, group=id, ordervar=time) %>%
  arrange(id, time)

Also I wondered if there might be a more elegant solution using mutate_each, but I didn't get that to work either. I can of course just write a long code with a line for each new lagged variable, but Id like to avoid that.

EDIT:

akrun's dplyr answer works, but takes a long time to compute for large data frames. The solution using data.table seems to be more efficient. So a dplyr or other solution that also allows the be implemented for several columns & several lags is still to be found.

EDIT 2:

For multiple columns and no groups (e.g. "ID") the following solution seems very well suited to me, due to its simplicity. The code may of course be shortened, but step by step:

df <- arrange(df, time)

df.lag <- shift(df[,1:24], n=1:3, give.names = T)  ##column indexes of columns to be lagged as "[,startcol:endcol]", "n=1:3" sepcifies the number of lags (lag1, lag2 and lag3 in this case)

df.result <- bind_cols(df, df.lag)
like image 271
yoland Avatar asked Jun 30 '16 09:06

yoland


2 Answers

We can use shift from data.table which can take multiple values for 'n'

library(data.table)
setDT(df)[order(time), c("a", "b", "c") := shift(x, 1:3) , id][order(id, time)]

Suppose, we need to do this on multiple columns

df$y <- df$x
setDT(df)[order(time), paste0(rep(c("x", "y"), each =3), 
                c("a", "b", "c")) :=shift(.SD, 1:3), id, .SDcols = x:y]

The shift can also be used in the dplyr

library(dplyr)
df %>% 
  group_by(id) %>% 
  arrange(id, time) %>% 
  do(data.frame(., setNames(shift(.$x, 1:3), c("a", "b", "c"))))
#    id  time     x     a     b     c
#   <dbl> <int> <int> <int> <int> <int>
#1      1  2000     1    NA    NA    NA
#2      1  2001     2     1    NA    NA
#3      1  2002     3     2     1    NA
#4      1  2003     4     3     2     1
#5      1  2004     5     4     3     2
#6      1  2005     6     5     4     3
#7      1  2006     7     6     5     4
#8      1  2007     8     7     6     5
#9      1  2008     9     8     7     6
#10     1  2009    10     9     8     7
#11     2  2000    10    NA    NA    NA
#12     2  2001    11    10    NA    NA
#13     2  2002    12    11    10    NA
#14     2  2003    13    12    11    10
#15     2  2004    14    13    12    11
#16     2  2005    15    14    13    12
#17     2  2006    16    15    14    13
#18     2  2007    17    16    15    14
#19     2  2008    18    17    16    15
#20     2  2009    19    18    17    16
like image 88
akrun Avatar answered Jan 25 '23 04:01

akrun


Could also create a function that will output a tibble:

library(tidyverse)

lag_multiple <- function(x, n_vec){
  map(n_vec, lag, x = x) %>% 
    set_names(paste0("lag", n_vec)) %>% 
    as_tibble()
}

tibble(x = 1:30) %>% 
  mutate(lag_multiple(x, 1:5))
#> # A tibble: 30 x 6
#>        x  lag1  lag2  lag3  lag4  lag5
#>    <int> <int> <int> <int> <int> <int>
#>  1     1    NA    NA    NA    NA    NA
#>  2     2     1    NA    NA    NA    NA
#>  3     3     2     1    NA    NA    NA
#>  4     4     3     2     1    NA    NA
#>  5     5     4     3     2     1    NA
#>  6     6     5     4     3     2     1
#>  7     7     6     5     4     3     2
#>  8     8     7     6     5     4     3
#>  9     9     8     7     6     5     4
#> 10    10     9     8     7     6     5
#> # ... with 20 more rows
like image 37
Bryan Shalloway Avatar answered Jan 25 '23 02:01

Bryan Shalloway