Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lag multiple variables multiple times in R

Tags:

r

series

lag

So, I'm working with a data frame that has daily data over a period of 444 days. I have several variables that I want to lag for use in a regression model (lm). I want to lag them 7 times each. I'm currently generating the lags like this...

email_data$email_reach1 <- lag(ts(email_data$email_reach, start = 1, end = 444), 1)
email_data$email_reach2 <- lag(ts(email_data$email_reach, start = 1, end = 444), 2)
email_data$email_reach3 <- lag(ts(email_data$email_reach, start = 1, end = 444), 3)
email_data$email_reach4 <- lag(ts(email_data$email_reach, start = 1, end = 444), 4)
email_data$email_reach5 <- lag(ts(email_data$email_reach, start = 1, end = 444), 5)
email_data$email_reach6 <- lag(ts(email_data$email_reach, start = 1, end = 444), 6)
email_data$email_reach7 <- lag(ts(email_data$email_reach, start = 1, end = 444), 7)

Then, I repeat this for every single variable I want to lag.

This seems like a terrible way of accomplishing this. Is there something better?

I've thought about lagging the entire data frame, which works, but I don't know how you'd assign variable names to the result and merge it back to the original data frame.

like image 465
John Chrysostom Avatar asked Apr 06 '15 16:04

John Chrysostom


4 Answers

You can also use data.table. (HT to @akrun)

set.seed(1)
email_data <- data.frame(dates=1:10, email_reach=rbinom(10, 10, 0.5))

library(data.table)
setDT(email_data)[, paste0('email_reach', 1:3) := shift(email_reach, 1:3)][]

#   dates email_reach email_reach1 email_reach2 email_reach3
# 1:     1           4           NA           NA           NA
# 2:     2           4            4           NA           NA
# 3:     3           5            4            4           NA
# 4:     4           7            5            4            4
# 5:     5           4            7            5            4
# 6:     6           7            4            7            5
# 7:     7           7            7            4            7
# 8:     8           6            7            7            4
# 9:     9           6            6            7            7
#10:    10           3            6            6            7
like image 188
Khashaa Avatar answered Sep 25 '22 09:09

Khashaa


Another approach is to use the xts library. A little example follows, we start out with:

x <- ts(matrix(rnorm(100),ncol=2), start=c(2009, 1), frequency=12) 
head(x)
        Series 1   Series 2
[1,] -1.82934747 -0.1234372
[2,]  1.08371836  1.3365919
[3,]  0.95786815  0.0885484
[4,]  0.59301446 -0.6984993
[5,] -0.01094955 -0.3729762
[6,] -0.19256525  0.3137705

Convert it to xts, an call lag(), here with 0,1,2 lags to minimize output:

library(xts)
head(lag(as.xts(x),0:2))
            Series.1   Series.2  Series.1.1 Series.2.1 Series.1.2 Series.2.2
jan 2009 -1.82934747 -0.1234372          NA         NA         NA         NA
feb 2009  1.08371836  1.3365919 -1.82934747 -0.1234372         NA         NA
mar 2009  0.95786815  0.0885484  1.08371836  1.3365919 -1.8293475 -0.1234372
apr 2009  0.59301446 -0.6984993  0.95786815  0.0885484  1.0837184  1.3365919
maj 2009 -0.01094955 -0.3729762  0.59301446 -0.6984993  0.9578682  0.0885484
jun 2009 -0.19256525  0.3137705 -0.01094955 -0.3729762  0.5930145 -0.6984993
like image 22
J.R. Avatar answered Sep 25 '22 09:09

J.R.


I think this does the same as your code above, for any given n.

n <- 7
for (i in 1:n) {
  email_data[[paste0("email_reach", i)]] <- lag(ts(email_data$email_reach, start = 1, end = 444), i)  
}
like image 33
Molx Avatar answered Sep 24 '22 09:09

Molx


This isn't really an answer, just using the answer format as an elaboration of my warning above:

email_data <- data.frame( email_reach=ts(email_data$email_reach, start = 1, end = 444))

Then your code and this is what you get:

> head(email_data, 10)
   email_reach email_reach1 email_reach2 email_reach3 email_reach4
1            4            4            4            4            4
2            4            4            4            4            4
3            5            5            5            5            5
4            7            7            7            7            7
5            4            4            4            4            4
6            7            7            7            7            7
7            7            7            7            7            7
8            6            6            6            6            6
9            6            6            6            6            6
10           3            3            3            3            3
   email_reach5 email_reach6 email_reach7
1             4            4            4
2             4            4            4
3             5            5            5
4             7            7            7
5             4            4            4
6             7            7            7
7             7            7            7
8             6            6            6
9             6            6            6
10            3            3            3
like image 45
IRTFM Avatar answered Sep 23 '22 09:09

IRTFM