Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of cumsum() iterativley in one column

Tags:

r

dplyr

cumsum

Is it possible to use cumsum() iteratively in one column with start - stop conditional on other column:

  1. given the dataframe df with one column X where values are ascending.
  2. cumsum() should stop when reaching 10 or a multiple of ten (e.g.20,30,40,...).
  3. then after reaching this point (10,20,30,40,..) cumsum() should start new... and so on.
  4. In case of ties (20,20 or 30,30) cumsum should stop in the last occurrence of 10, 20, 30, 40,... this is the dataframe:
df <- structure(list(X = c(55L, 95L, 39L, 52L, 22L, 93L, 76L, 82L, 
77L, 58L, 60L, 19L, 31L, 43L, 65L, 56L, 18L, 66L, 21L, 49L, 13L, 
37L, 36L, 51L, 41L, 7L, 91L, 3L, 11L, 65L, 51L, 32L, 25L, 10L, 
5L, 7L, 8L, 3L, 72L, 66L, 93L, 24L, 48L, 44L, 91L, 60L, 62L, 
89L, 100L, 69L, 5L, 89L, 54L, 19L, 39L, 93L, 60L, 64L, 89L, 81L, 
24L, 9L, 51L, 9L, 7L, 69L, 19L, 51L, 39L, 100L, 83L, 67L, 33L, 
84L, 66L, 25L, 20L, 87L, 55L, 56L, 83L, 52L, 96L, 42L, 16L, 64L, 
45L, 30L, 55L, 29L, 16L, 73L, 40L, 29L, 92L, 6L, 38L, 12L, 38L, 
89L)), class = "data.frame", row.names = c(NA, -100L))

Many thanks!!!

desired output as an example up to 30.

enter image description here

like image 759
TarJae Avatar asked Dec 14 '22 07:12

TarJae


2 Answers

UPDATE

Case-1: To handle the ties efficiently (OP's condition-4). Let's take a different example where there are ties as well as two consecutive values divisible by 10. (I think other strategies may fail there)
df1 <- data.frame(X = c(3, 4, 10, 10, 10, 13, 20, 30, 31, 40, 45))

df1 %>% arrange(X) %>% group_by(X) %>%
  mutate(d = n(),
         d2 = row_number(),
         d2 = d2 == max(d2)) %>% ungroup() %>% 
  group_by(Y = cumsum( X %% 10 == 0 & d2)) %>%
  mutate(Y = cumsum(X)) %>% ungroup() %>%
  select(-d, -d2)

# A tibble: 11 x 2
       X     Y
   <dbl> <dbl>
 1     3     3
 2     4     7
 3    10    17
 4    10    27
 5    10    10
 6    13    23
 7    20    20
 8    30    30
 9    31    61
10    40    40
11    45    85
This can be done through accumulate also.

Case-2: when group starts next value and ties are also handled properly

df1 %>% arrange(X) %>% group_by(X) %>%
  mutate(d = n(),
         d2 = row_number(),
         d2 = d2 == max(d2)) %>% ungroup() %>% 
  group_by(Y = lag(cumsum( X %% 10 == 0 & d2), default = 0)) %>%
  mutate(Y = cumsum(X)) %>% ungroup() %>%
  select(-d, -d2)
# A tibble: 11 x 2
       X     Y
   <dbl> <dbl>
 1     3     3
 2     4     7
 3    10    17
 4    10    27
 5    10    37
 6    13    13
 7    20    33
 8    30    30
 9    31    31
10    40    71
11    45    45

Earlier Answer

Case-3: When next cumsum starts from next value.
df %>% arrange(X) %>%
  mutate(y = accumulate(X, .init = 0, ~ifelse(.y %% 10 == 0, 1, 0))[-nrow(df)],
         y = accumulate2(X, y, .init = 0, ~ifelse(..3 == 1, ..2, ..1 + ..2))[-1])

     X    y
1     3    3
2     3    6
3     5   11
4     5   16
5     6   22
6     7   29
7     7   36
8     7   43
9     8   51
10    9   60
11    9   69
12   10   79
13   11   11
14   12   23
15   13   36
16   16   52
17   16   68
18   18   86
19   19  105
20   19  124
21   19  143
22   20  163
23   21   21
24   22   43
25   24   67
26   24   91
27   25  116
28   25  141
29   29  170
30   29  199
31   30  229
32   31   31
33   32   63
34   33   96
35   36  132
36   37  169
37   38  207
38   38  245
39   39  284
40   39  323
41   39  362
42   40  402
43   41   41
44   42   83
45   43  126
46   44  170
47   45  215
48   48  263
49   49  312
50   51  363
51   51  414
52   51  465
53   51  516
54   52  568
55   52  620
56   54  674
57   55  729
58   55  784
59   55  839
60   56  895
61   56  951
62   58 1009
63   60 1069
64   60   60
65   60   60
66   62   62
67   64  126
68   64  190
69   65  255
70   65  320
71   66  386
72   66  452
73   66  518
74   67  585
75   69  654
76   69  723
77   72  795
78   73  868
79   76  944
80   77 1021
81   81 1102
82   82 1184
83   83 1267
84   83 1350
85   84 1434
86   87 1521
87   89 1610
88   89 1699
89   89 1788
90   89 1877
91   91 1968
92   91 2059
93   92 2151
94   93 2244
95   93 2337
96   93 2430
97   95 2525
98   96 2621
99  100 2721
100 100  100
like image 164
AnilGoyal Avatar answered Jan 10 '23 08:01

AnilGoyal


We can arrange by 'X', create a grouping column with %/% and get the cumulative sum (cumsum) of 'X'

library(dplyr)
df %>% 
    arrange(X) %>%
    group_by(grp = lag(X  %/% 10, default = 0)) %>%
    mutate(new = cumsum(X))

-output

# A tibble: 100 x 3
# Groups:   grp [11]
#       X   grp   new
#   <int> <dbl> <int>
# 1     3     0     3
# 2     3     0     6
# 3     5     0    11
# 4     5     0    16
# 5     6     0    22
# 6     7     0    29
# 7     7     0    36
# 8     7     0    43
# 9     8     0    51
#10     9     0    60
# … with 90 more rows

If it needs to start at 10, 20,

df %>% 
    arrange(X) %>%
    group_by(grp = X  %/% 10) %>%
    mutate(new = cumsum(X))
like image 37
akrun Avatar answered Jan 10 '23 09:01

akrun