I have the below df:
df <- data.table(user = c('a', 'a', 'a', 'b', 'b')
, spend = 1:5
, shift_by = c(1,1,2,1,1)
); df
user spend shift_by
1: a 1 1
2: a 2 1
3: a 3 2
4: b 4 1
5: b 5 1
I am looking to create a lead lag column only this time the n
parameter in data.table
's shift
function is dynamic and takes df$shiftby
as input. My expected result is:
df[, spend_shifted := c(NA, 1, 1, NA, 4)]; df
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 1
3: a 3 2 1
4: b 4 1 NA
5: b 5 1 4
However, with the below attempt it gives:
df[, spend_shifted := shift(x=spend, n=shift_by, type="lag"), user]; df
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 NA
3: a 3 2 NA
4: b 4 1 NA
5: b 5 1 NA
This is the closest example I could find. However, I need a group by and am after a data.table
solution because of speed. Truly look forward to finding any ideas.
In practice, we often want to create leads and lags of more than one element of our vector. We can simply do that by specifying the number of steps within the lead… Such operations are especially useful for time series data, where we want to predict the future.
A vector, list, data.frame or data.table. integer vector denoting the offset by which to lead or lag the input. To create multiple lead/lag vectors, provide multiple values to n; negative values of n will "flip" the value of type, i.e., n=-1 and type='lead' is the same as n=1 and type='lag'.
First we ask the user to input N integer numbers and store it inside array variable a [N]. We then ask the user to input the number of positions to shift the elements of the array, and then the direction of shifting. If user inputs 1, then its LEFT shift, if user inputs 0, then its RIGHT shift operation. view plain copy to clipboard print?
As you can see based on the previous RStudio console outputs, the lead function shifted our vector one element to the right side (i.e. cut off the first value and added an NA at the end) and the lag function shifted our vector one element to the left (i.e. cut off the last value and appended an NA at the beginning).
I believe this will work. You can drop the newindex-column afterward.
df[, newindex := rowid(user) - shift_by]
df[newindex < 0, newindex := 0]
df[newindex > 0, spend_shifted := df[, spend[newindex], by = .(user)]$V1]
# user spend shift_by newindex spend_shifted
# 1: a 1 1 0 NA
# 2: a 2 1 1 1
# 3: a 3 2 1 1
# 4: b 4 1 0 NA
# 5: b 5 1 1 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With