Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem using lag() within a mutate() function (tidyverse)

Tags:

r

dplyr

I am trying to add another column to a dataframe where the new column is a function of the previous value in the new column and a current row value. I have tried to strip out irrelevant code and stick in easy numbers so that I might understand answers here. Given the following dataframe:

  x
1 1
2 2
3 3
4 4
5 5

The next column (y) will add 5 to x and also add the previous row's value for y. There's no previous value for y in the first row, so I define it as 0. So the first row value for y would be x+5+0 or 1+5+0 or 6. The second row would be x+5+y(from 1st row) or 2+5+6 or 13. The dataframe should look like this:

  x  y
1 1  6
2 2 13
3 3 21
4 4 30
5 5 40

I tried this with case_when() and lag() functions like this:

test_df <- data.frame(x = 1:5)
test_df %>% mutate(y = case_when(x==1 ~ 6,
+                                    x>1 ~ x+5+lag(y)))

Error: Problem with mutate() column y. ℹ y = case_when(x == 1 ~ 6, x > 1 ~ x + 5 + lag(y)). x object 'y' not found Run rlang::last_error() to see where the error occurred.

I had thought y was defined when the first row was calculated. Is there a better way to do this? Thanks!

like image 531
CosmicSpittle Avatar asked Jan 25 '23 05:01

CosmicSpittle


2 Answers

You don't need lag here at all. Just a cumsum should suffice.

test_df %>% mutate(y = cumsum(x + 5))

#>   x  y
#> 1 1  6
#> 2 2 13
#> 3 3 21
#> 4 4 30
#> 5 5 40

Data

test_df <- data.frame(x = 1:5)
like image 155
Allan Cameron Avatar answered Feb 04 '23 19:02

Allan Cameron


We can also use purrr::accumulate here:

library(purrr)

df %>% mutate(y = accumulate(x+5, ~.x + .y))

  x  y
1 1  6
2 2 13
3 3 21
4 4 30
5 5 40

We can also use accumulate with regular base R synthax:

df %>% mutate(y = accumulate(x+5, function(x, y) {x + y}))
like image 45
GuedesBF Avatar answered Feb 04 '23 18:02

GuedesBF