I have the following code, I am calculating the percent growth in the data points and then I calculate the change in the percent growth, what I am looking for is to be able to add a column where I count the number of readings where the percent growth change is negative
df <- data.frame(id = c(1,2,3,4,5,6,7,8,9,10,11,12), data = c(19, 19, 27, 27, 38, 42, 47, 48, 49, 50, 51, 53))
df <- mutate(df, pct_growth = (data - lag(data))/lag(data))
df <- mutate(df, pct_growth_change = pct_growth - lag(pct_growth))
df$pct_growth_streak <- 0
df <- dplyr::mutate(df, pct_growth_streak = ifelse(pct_growth_change <=0, lag(pct_growth_streak)+1,0) )
What I am getting as the output is the following
id data pct_growth pct_growth_change pct_growth_streak
1 1 19 NA NA NA
2 2 19 0.00000000 NA NA
3 3 27 0.42105263 0.4210526316 0
4 4 27 0.00000000 -0.4210526316 1
5 5 38 0.40740741 0.4074074074 0
6 6 42 0.10526316 -0.3021442495 1
7 7 47 0.11904762 0.0137844612 0
8 8 48 0.02127660 -0.0977710233 1
9 9 49 0.02083333 -0.0004432624 1
10 10 50 0.02040816 -0.0004251701 1
11 11 51 0.02000000 -0.0004081633 1
12 12 53 0.03921569 0.0192156863 0
And what I need is
id data pct_growth pct_growth_change pct_growth_streak
1 1 19 NA NA NA
2 2 19 0.00000000 NA NA
3 3 27 0.42105263 0.4210526316 0
4 4 27 0.00000000 -0.4210526316 1
5 5 38 0.40740741 0.4074074074 0
6 6 42 0.10526316 -0.3021442495 1
7 7 47 0.11904762 0.0137844612 0
8 8 48 0.02127660 -0.0977710233 1
9 9 49 0.02083333 -0.0004432624 2
10 10 50 0.02040816 -0.0004251701 3
11 11 51 0.02000000 -0.0004081633 4
12 12 53 0.03921569 0.0192156863 0
lag lag shifts the times one back. It does not change the values, only the times. Thus lag changes the tsp attribute from c(1, 4, 1) to c(0, 3, 1) . The start time is shifted from 1 to 0, the end time is shifted from 4 to 3 and since shifts do not change the frequency the frequency remains 1.
The opposite of lag() function is lead()
We can use rleid
to create groups of consecutive streaks and calculate cumsum
over it.
library(data.table)
setDT(df)[, pct_growth_streak := cumsum(pct_growth_streak),
rleid(pct_growth_streak)]
df
# id data pct_growth pct_growth_change pct_growth_streak
# 1: 1 19 NA NA NA
# 2: 2 19 0.00000000 NA NA
# 3: 3 27 0.42105263 0.4210526316 0
# 4: 4 27 0.00000000 -0.4210526316 1
# 5: 5 38 0.40740741 0.4074074074 0
# 6: 6 42 0.10526316 -0.3021442495 1
# 7: 7 47 0.11904762 0.0137844612 0
# 8: 8 48 0.02127660 -0.0977710233 1
# 9: 9 49 0.02083333 -0.0004432624 2
#10: 10 50 0.02040816 -0.0004251701 3
#11: 11 51 0.02000000 -0.0004081633 4
#12: 12 53 0.03921569 0.0192156863 0
We can use it dplyr
too :
library(dplyr)
df %>%
group_by(grp = rleid(pct_growth_streak)) %>%
mutate(pct_growth_streak = cumsum(pct_growth_streak))
Or with ave
:
with(df, ave(pct_growth_streak, rleid(pct_growth_streak), FUN = cumsum))
One approach: first define a grouping variable sgrp
that increments with each sign change of pct_growth_change
:
df %<>% mutate(sgrp = cumsum(if_else(sign(pct_growth_change) ==
sign(lag(pct_growth_change, 1)), 0, 1, 1)))
Then group by sgrp
and set pct_growth_streak
as the row number within the group if pct_growth_change
is negative.
df %>%
group_by(sgrp) %>%
mutate(pct_growth_streak =
(pct_growth_change < 0) * row_number()
) %>%
ungroup() %>%
select(-sgrp);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With