Currently I have a utility function that lags
things in data.table
by group. The function is simple:
panel_lag <- function(var, k) {
if (k > 0) {
# Bring past values forward k times
return(c(rep(NA, k), head(var, -k)))
} else {
# Bring future values backward
return(c(tail(var, k), rep(NA, -k)))
}
}
I can then call this from a data.table
:
x = data.table(a=1:10,
dte=sample(seq.Date(from=as.Date("2012-01-20"),
to=as.Date("2012-01-30"), by=1),
10))
x[, L1_a:=panel_lag(a, 1)] # This won't work correctly as `x` isn't keyed by date
setkey(x, dte)
x[, L1_a:=panel_lag(a, 1)] # This will
This requires that I check inside panel_lag
whether x
is keyed. Is there a better way to do lagging? The tables tend to be large so they should really be keyed. I just do setkey
before i lag. I would like to make sure I don't forget to key them. So I would like to know if there is a standard way people do this.
lag lag shifts the times one back. It does not change the values, only the times. Thus lag changes the tsp attribute from c(1, 4, 1) to c(0, 3, 1) . The start time is shifted from 1 to 0, the end time is shifted from 4 to 3 and since shifts do not change the frequency the frequency remains 1.
A lead–lag effect, especially in economics, describes the situation where one (leading) variable is cross-correlated with the values of another (lagging) variable at later times. In nature and climate, bigger systems often display more pronounced lag effects.
The opposite of lag() function is lead()
If you want to ensure that you lag in order of some other column, you could use the order
function:
x[order(dte),L1_a:=panel_lag(a,1)]
Though if you're doing a lot of things in date order it would make sense to key it that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With