Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr lag with n from column values

Tags:

r

dplyr

Is it possible to use column values as n in a dplyr::lag function?

Reproducible example:

DF <- data.frame(
    V = runif(1000, min=-100, max=100), 
    nlag = as.integer(runif(1000, min=1, max=10))
) %>% 
mutate(Vlag = lag(V, n = nlag))

I get this error:

Error: Evaluation error: n must be a nonnegative integer scalar, not integer of length 1000.

Is there any other alternative?

Update:

How do we solve the same problem within groups?

Reproducible example:

DF <- data.frame(
    V = runif(1000, min=-100, max=100),
    nlag = as.integer(runif(1000, min=1, max=10)),
    type = sample(1:4, replace=TRUE)
) %>%
group_by(type) %>% 
mutate(Vlag = lag(V, n = nlag))
like image 757
Medical physicist Avatar asked Aug 28 '18 08:08

Medical physicist


2 Answers

The documentation at ?lag says

n
a positive integer of length 1, giving the number of positions to lead or lag by

So it is not possible to give any number greater than length = 1 .

We can however generate the index to get V value by subtracting the current row index with the corresponding nlag value and then we use that index to get the lagged V value.

df$lag_value <- sapply(seq_along(df$nlag), function(x) {
      indx = x - df$nlag[x]
     if(indx > 0)
        df$V[indx]
     else
        NA
})
df

#          V nlag lag_value
#1  51.30453    6        NA
#2 -66.33709    4        NA
#3  95.45096    9        NA
#4  44.54434    3  51.30453
#5  62.00180    3 -66.33709
#6 -18.43012    4 -66.33709

Update

If we want to do this by groups we can split them by type column and apply the same operation.

df$lag_value <- unlist(lapply(split(df, df$type), function(x) 
        sapply(seq_along(x$nlag), function(y) {
          indx = y - x$nlag[y]
          if(indx > 0)
            x$V[indx]
          else
             NA
})))

data

df <- head(DF)
like image 141
Ronak Shah Avatar answered Oct 17 '22 05:10

Ronak Shah


nlag must have lenth 1, try something like this:

DF <- data.frame(
  V = runif(1000, min=-100, max=100), 
  nlag = as.integer(runif(1000, min=1, max=10))
) %>%  mutate(Vlag = V[if_else((row_number() - nlag) < 1, as.integer(NA), row_number() - nlag)])
                V nlag         Vlag
1     -6.72598341    4           NA
2    -84.67472238    2           NA
3     -4.98048104    7           NA
4      2.64957272    4           NA
5     82.16284532    4  -6.72598341
6     28.93483448    9           NA
7     88.16730371    3   2.64957272
8     42.31721302    7  -6.72598341
9    -38.12659876    1  42.31721302
10    74.62628153    3  88.16730371
...
like image 4
Juan Antonio Roldán Díaz Avatar answered Oct 17 '22 04:10

Juan Antonio Roldán Díaz