Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shift with dynamic n (number of position lead / lag by)

Tags:

r

data.table

I have the below df:

df <- data.table(user = c('a', 'a', 'a', 'b', 'b')
                 , spend = 1:5
                 , shift_by = c(1,1,2,1,1)
                 ); df

   user spend shift_by
1:    a     1        1
2:    a     2        1
3:    a     3        2
4:    b     4        1
5:    b     5        1

I am looking to create a lead lag column only this time the n parameter in data.table's shift function is dynamic and takes df$shiftby as input. My expected result is:

df[, spend_shifted := c(NA, 1, 1, NA, 4)]; df

   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1             1
3:    a     3        2             1
4:    b     4        1            NA
5:    b     5        1             4

However, with the below attempt it gives:

df[, spend_shifted := shift(x=spend, n=shift_by, type="lag"), user]; df

   user spend shift_by spend_shifted
1:    a     1        1            NA
2:    a     2        1            NA
3:    a     3        2            NA
4:    b     4        1            NA
5:    b     5        1            NA

This is the closest example I could find. However, I need a group by and am after a data.table solution because of speed. Truly look forward to finding any ideas.

like image 839
Sweepy Dodo Avatar asked Nov 02 '21 14:11

Sweepy Dodo


People also ask

How to create leads and lags of more than one element?

In practice, we often want to create leads and lags of more than one element of our vector. We can simply do that by specifying the number of steps within the lead… Such operations are especially useful for time series data, where we want to predict the future.

What is a lead/lag vector?

A vector, list, data.frame or data.table. integer vector denoting the offset by which to lead or lag the input. To create multiple lead/lag vectors, provide multiple values to n; negative values of n will "flip" the value of type, i.e., n=-1 and type='lead' is the same as n=1 and type='lag'.

How to perform right and left shift operation on an array?

First we ask the user to input N integer numbers and store it inside array variable a [N]. We then ask the user to input the number of positions to shift the elements of the array, and then the direction of shifting. If user inputs 1, then its LEFT shift, if user inputs 0, then its RIGHT shift operation. view plain copy to clipboard print?

What is the difference between lead and lag in RStudio?

As you can see based on the previous RStudio console outputs, the lead function shifted our vector one element to the right side (i.e. cut off the first value and added an NA at the end) and the lag function shifted our vector one element to the left (i.e. cut off the last value and appended an NA at the beginning).


Video Answer


1 Answers

I believe this will work. You can drop the newindex-column afterward.

df[, newindex := rowid(user) - shift_by]
df[newindex < 0, newindex := 0]
df[newindex > 0, spend_shifted := df[, spend[newindex], by = .(user)]$V1]
#    user spend shift_by newindex spend_shifted
# 1:    a     1        1        0            NA
# 2:    a     2        1        1             1
# 3:    a     3        2        1             1
# 4:    b     4        1        0            NA
# 5:    b     5        1        1             4
like image 171
Wimpel Avatar answered Oct 26 '22 02:10

Wimpel