So, I'm working with a data frame that has daily data over a period of 444 days. I have several variables that I want to lag for use in a regression model (lm
). I want to lag them 7 times each. I'm currently generating the lags like this...
email_data$email_reach1 <- lag(ts(email_data$email_reach, start = 1, end = 444), 1)
email_data$email_reach2 <- lag(ts(email_data$email_reach, start = 1, end = 444), 2)
email_data$email_reach3 <- lag(ts(email_data$email_reach, start = 1, end = 444), 3)
email_data$email_reach4 <- lag(ts(email_data$email_reach, start = 1, end = 444), 4)
email_data$email_reach5 <- lag(ts(email_data$email_reach, start = 1, end = 444), 5)
email_data$email_reach6 <- lag(ts(email_data$email_reach, start = 1, end = 444), 6)
email_data$email_reach7 <- lag(ts(email_data$email_reach, start = 1, end = 444), 7)
Then, I repeat this for every single variable I want to lag.
This seems like a terrible way of accomplishing this. Is there something better?
I've thought about lagging the entire data frame, which works, but I don't know how you'd assign variable names to the result and merge it back to the original data frame.
You can also use data.table
. (HT to @akrun)
set.seed(1)
email_data <- data.frame(dates=1:10, email_reach=rbinom(10, 10, 0.5))
library(data.table)
setDT(email_data)[, paste0('email_reach', 1:3) := shift(email_reach, 1:3)][]
# dates email_reach email_reach1 email_reach2 email_reach3
# 1: 1 4 NA NA NA
# 2: 2 4 4 NA NA
# 3: 3 5 4 4 NA
# 4: 4 7 5 4 4
# 5: 5 4 7 5 4
# 6: 6 7 4 7 5
# 7: 7 7 7 4 7
# 8: 8 6 7 7 4
# 9: 9 6 6 7 7
#10: 10 3 6 6 7
Another approach is to use the xts
library. A little example follows, we start out with:
x <- ts(matrix(rnorm(100),ncol=2), start=c(2009, 1), frequency=12)
head(x)
Series 1 Series 2
[1,] -1.82934747 -0.1234372
[2,] 1.08371836 1.3365919
[3,] 0.95786815 0.0885484
[4,] 0.59301446 -0.6984993
[5,] -0.01094955 -0.3729762
[6,] -0.19256525 0.3137705
Convert it to xts
, an call lag()
, here with 0,1,2 lags to minimize output:
library(xts)
head(lag(as.xts(x),0:2))
Series.1 Series.2 Series.1.1 Series.2.1 Series.1.2 Series.2.2
jan 2009 -1.82934747 -0.1234372 NA NA NA NA
feb 2009 1.08371836 1.3365919 -1.82934747 -0.1234372 NA NA
mar 2009 0.95786815 0.0885484 1.08371836 1.3365919 -1.8293475 -0.1234372
apr 2009 0.59301446 -0.6984993 0.95786815 0.0885484 1.0837184 1.3365919
maj 2009 -0.01094955 -0.3729762 0.59301446 -0.6984993 0.9578682 0.0885484
jun 2009 -0.19256525 0.3137705 -0.01094955 -0.3729762 0.5930145 -0.6984993
I think this does the same as your code above, for any given n
.
n <- 7
for (i in 1:n) {
email_data[[paste0("email_reach", i)]] <- lag(ts(email_data$email_reach, start = 1, end = 444), i)
}
This isn't really an answer, just using the answer format as an elaboration of my warning above:
email_data <- data.frame( email_reach=ts(email_data$email_reach, start = 1, end = 444))
Then your code and this is what you get:
> head(email_data, 10)
email_reach email_reach1 email_reach2 email_reach3 email_reach4
1 4 4 4 4 4
2 4 4 4 4 4
3 5 5 5 5 5
4 7 7 7 7 7
5 4 4 4 4 4
6 7 7 7 7 7
7 7 7 7 7 7
8 6 6 6 6 6
9 6 6 6 6 6
10 3 3 3 3 3
email_reach5 email_reach6 email_reach7
1 4 4 4
2 4 4 4
3 5 5 5
4 7 7 7
5 4 4 4
6 7 7 7
7 7 7 7
8 6 6 6
9 6 6 6
10 3 3 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With