Is it possible to use cumsum()
iteratively in one column with start - stop conditional on other column:
df
with one column X
where values are ascending.cumsum()
should stop when reaching 10 or a multiple of ten (e.g.20,30,40,...).cumsum
should stop in the last occurrence of 10, 20, 30, 40,...
this is the dataframe:df <- structure(list(X = c(55L, 95L, 39L, 52L, 22L, 93L, 76L, 82L,
77L, 58L, 60L, 19L, 31L, 43L, 65L, 56L, 18L, 66L, 21L, 49L, 13L,
37L, 36L, 51L, 41L, 7L, 91L, 3L, 11L, 65L, 51L, 32L, 25L, 10L,
5L, 7L, 8L, 3L, 72L, 66L, 93L, 24L, 48L, 44L, 91L, 60L, 62L,
89L, 100L, 69L, 5L, 89L, 54L, 19L, 39L, 93L, 60L, 64L, 89L, 81L,
24L, 9L, 51L, 9L, 7L, 69L, 19L, 51L, 39L, 100L, 83L, 67L, 33L,
84L, 66L, 25L, 20L, 87L, 55L, 56L, 83L, 52L, 96L, 42L, 16L, 64L,
45L, 30L, 55L, 29L, 16L, 73L, 40L, 29L, 92L, 6L, 38L, 12L, 38L,
89L)), class = "data.frame", row.names = c(NA, -100L))
Many thanks!!!
desired output as an example up to 30.
UPDATE
Case-1: To handle the ties efficiently (OP's condition-4). Let's take a different example where there are ties as well as two consecutive values divisible by 10. (I think other strategies may fail there)df1 <- data.frame(X = c(3, 4, 10, 10, 10, 13, 20, 30, 31, 40, 45))
df1 %>% arrange(X) %>% group_by(X) %>%
mutate(d = n(),
d2 = row_number(),
d2 = d2 == max(d2)) %>% ungroup() %>%
group_by(Y = cumsum( X %% 10 == 0 & d2)) %>%
mutate(Y = cumsum(X)) %>% ungroup() %>%
select(-d, -d2)
# A tibble: 11 x 2
X Y
<dbl> <dbl>
1 3 3
2 4 7
3 10 17
4 10 27
5 10 10
6 13 23
7 20 20
8 30 30
9 31 61
10 40 40
11 45 85
This can be done through accumulate
also.
Case-2: when group starts next value and ties are also handled properly
df1 %>% arrange(X) %>% group_by(X) %>%
mutate(d = n(),
d2 = row_number(),
d2 = d2 == max(d2)) %>% ungroup() %>%
group_by(Y = lag(cumsum( X %% 10 == 0 & d2), default = 0)) %>%
mutate(Y = cumsum(X)) %>% ungroup() %>%
select(-d, -d2)
# A tibble: 11 x 2
X Y
<dbl> <dbl>
1 3 3
2 4 7
3 10 17
4 10 27
5 10 37
6 13 13
7 20 33
8 30 30
9 31 31
10 40 71
11 45 45
Earlier Answer
Case-3: When next cumsum starts from next value.df %>% arrange(X) %>%
mutate(y = accumulate(X, .init = 0, ~ifelse(.y %% 10 == 0, 1, 0))[-nrow(df)],
y = accumulate2(X, y, .init = 0, ~ifelse(..3 == 1, ..2, ..1 + ..2))[-1])
X y
1 3 3
2 3 6
3 5 11
4 5 16
5 6 22
6 7 29
7 7 36
8 7 43
9 8 51
10 9 60
11 9 69
12 10 79
13 11 11
14 12 23
15 13 36
16 16 52
17 16 68
18 18 86
19 19 105
20 19 124
21 19 143
22 20 163
23 21 21
24 22 43
25 24 67
26 24 91
27 25 116
28 25 141
29 29 170
30 29 199
31 30 229
32 31 31
33 32 63
34 33 96
35 36 132
36 37 169
37 38 207
38 38 245
39 39 284
40 39 323
41 39 362
42 40 402
43 41 41
44 42 83
45 43 126
46 44 170
47 45 215
48 48 263
49 49 312
50 51 363
51 51 414
52 51 465
53 51 516
54 52 568
55 52 620
56 54 674
57 55 729
58 55 784
59 55 839
60 56 895
61 56 951
62 58 1009
63 60 1069
64 60 60
65 60 60
66 62 62
67 64 126
68 64 190
69 65 255
70 65 320
71 66 386
72 66 452
73 66 518
74 67 585
75 69 654
76 69 723
77 72 795
78 73 868
79 76 944
80 77 1021
81 81 1102
82 82 1184
83 83 1267
84 83 1350
85 84 1434
86 87 1521
87 89 1610
88 89 1699
89 89 1788
90 89 1877
91 91 1968
92 91 2059
93 92 2151
94 93 2244
95 93 2337
96 93 2430
97 95 2525
98 96 2621
99 100 2721
100 100 100
We can arrange
by 'X', create a grouping column with %/%
and get the cumulative sum (cumsum
) of 'X'
library(dplyr)
df %>%
arrange(X) %>%
group_by(grp = lag(X %/% 10, default = 0)) %>%
mutate(new = cumsum(X))
-output
# A tibble: 100 x 3
# Groups: grp [11]
# X grp new
# <int> <dbl> <int>
# 1 3 0 3
# 2 3 0 6
# 3 5 0 11
# 4 5 0 16
# 5 6 0 22
# 6 7 0 29
# 7 7 0 36
# 8 7 0 43
# 9 8 0 51
#10 9 0 60
# … with 90 more rows
If it needs to start at 10, 20,
df %>%
arrange(X) %>%
group_by(grp = X %/% 10) %>%
mutate(new = cumsum(X))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With