Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I automatically create n lags in a timeseries?

Tags:

r

I have a dataframe with a column t. I want to create n lagged columns that has names like t-1,t-2 etc..

  year      t  t-1 t-2
19620101    1   NA  NA
19630102    2   1   NA
19640103    3   2   1
19650104    4   3   2
19650104    5   4   3
19650104    6   5   4

My idea is that I will do it in four steps:

  • A loop for the column names using "paste"
  • A loop for the temporary dataframes for lagged columns using "paste"
  • A loop for creating the lagged columns
  • cbind them.

But I am not able to proceed with the code. Something rough:

df_final<-lagged(df="odd",n=3)

lagged<-function(df,n){
   df<-zoo(df)
   lags<-paste("A", 1:n, sep ="_")
   for (i in 1:5) {
     odd<-as.data.frame(lag(odd$OBS_Q,-1*i,na.pad =  TRUE))

   #Cbind here
   } 

I am stuck in writing this function. Could you please show some way? Or a different simpler way of doing this....

Reference: Basic lag in R vector/dataframe


Addendum:

Real data:

x<-structure(list(DATE = 19630101:19630104, PRECIP = c(0, 0, 0,0), 
               OBS_Q = c(1.61, 1.48, 1.4, 1.33), swb = c(1.75, 1.73, 1.7,1.67), 
               gr4j = c(1.9, 1.77, 1.67, 1.58), isba = c(0.83, 0.83,0.83, 0.83), 
               noah = c(1.31, 1.19, 1.24, 1.31), sac = c(1.99,1.8, 1.66, 1.57), 
               swap = c(1.1, 1.05, 1.08, 0.99), vic.mm.day. = c(2.1,1.75, 1.55, 1.43)), 
          .Names = c("DATE", "PRECIP", "OBS_Q", "swb","gr4j", "isba", "noah", "sac", "swap", "vic.mm.day."), 
          class = c("data.table","data.frame"), row.names = c(NA, -4L))

The column to be lagged is OBS_Q.

like image 880
maximusdooku Avatar asked Jan 20 '15 21:01

maximusdooku


2 Answers

I might build something around base R's embed()

x <- c(rep(NA,2),1:6)
embed(x,3)
#      [,1] [,2] [,3]
# [1,]    1   NA   NA
# [2,]    2    1   NA
# [3,]    3    2    1
# [4,]    4    3    2
# [5,]    5    4    3
# [6,]    6    5    4

Perhaps something like this:

f <- function(x, dimension, pad) {
    if(!missing(pad)) {
        x <- c(rep(pad, dimension-1), x)
    }
    embed(x, dimension)
}
f(1:6, dimension=3, pad=NA)
#      [,1] [,2] [,3]
# [1,]    1   NA   NA
# [2,]    2    1   NA
# [3,]    3    2    1
# [4,]    4    3    2
# [5,]    5    4    3
# [6,]    6    5    4
like image 186
Josh O'Brien Avatar answered Oct 08 '22 19:10

Josh O'Brien


If you are looking for efficiency, try data.tables new shift function

library(data.table) # V >= 1.9.5
n <- 2
setDT(df)[, paste("t", 1:n) := shift(t, 1:n)][]
#    t t 1 t 2
# 1: 1  NA  NA
# 2: 2   1  NA
# 3: 3   2   1
# 4: 4   3   2
# 5: 5   4   3
# 6: 6   5   4 

Here you can set any name for your new columns (within paste) and you also don't need to bind this back to the original as this updates your data set by reference using the := operator.

like image 28
David Arenburg Avatar answered Oct 08 '22 17:10

David Arenburg