Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

general lag in time series panel data

Tags:

r

time-series

lag

I have a dataset akin to this

User    Date        Value
A       2012-01-01  4
A       2012-01-02  5   
A       2012-01-03  6
A       2012-01-04  7
B       2012-01-01  2
B       2012-01-02  3   
B       2012-01-03  4
B       2012-01-04  5

I want to create a lag of Value, respecting User.

User    Date        Value   Value.lag
A       2012-01-01  4       NA
A       2012-01-02  5       4
A       2012-01-03  6       5
A       2012-01-04  7       6
B       2012-01-01  2       NA
B       2012-01-02  3       2   
B       2012-01-03  4       3
B       2012-01-04  5       4

I've done it very inefficiently in a loop

df$value.lag1<-NA
levs<-levels(as.factor(df$User))
levs
  for (i in 1:length(levs)) {
    temper<- subset(df,User==as.numeric(levs[i]))
    temper<- rbind(NA,temper[-nrow(temper),])  
df$value.lag1[df$User==as.numeric(as.character(levs[i]))]<- temper
      }

But this is very slow. I've looked at using by and tapply, but not figured out how to make them work.

I don't think XTS or TS will work because of the User element.

Any suggestions?

like image 260
Daniel Egan Avatar asked Jan 18 '12 12:01

Daniel Egan


2 Answers

I think the easiest way, especially considering doing further analysis, is to convert your data frame to pdata.frame class from plm package.

After the conversion from diff() and lag() operators can be used to create panel differences and lags.

df<-pdata.frame(df,index=c("id","date"))  
df<-transform(df, l_value=lag(value,1))   
like image 114
Andriy Levitskyy Avatar answered Sep 19 '22 18:09

Andriy Levitskyy


For a panel without missing obs this is an intuitive solution:

df <- data.frame(id = c(1, 1, 1, 1, 1, 2, 2), 
                 date = c(1992, 1993, 1991, 1990, 1994, 1992, 1991), 
                 value = c(4.1, 4.5, 3.3, 5.3, 3.0, 3.2, 5.2))

df<-df[with(df, order(id,date)), ]  # sort by id and then by date
df$l_value=c(NA,df$value[-length(df$value)]) # create a new var with data displaced by 1 unit
df$l_value[df$id != c(NA, df$id[-length(df$id)])] =NA # NA data with different current and lagged id.
df

id date value l_value
4  1 1990   5.3      NA
3  1 1991   3.3     5.3
1  1 1992   4.1     3.3
2  1 1993   4.5     4.1
5  1 1994   3.0     4.5
7  2 1991   5.2      NA
6  2 1992   3.2     5.2
like image 30
Fix.B. Avatar answered Sep 20 '22 18:09

Fix.B.