Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative sum conditional to interval

Tags:

r

dplyr

I would like to calculate the conditional sum of a data frame column for a set of intervals [n, +∞) (i.e., ≥ n) applied to another column. In the example data below the intervals are applied to column a and the values in column b are conditionally summed. For [0, +∞) all of column a values are ≥ 0 so b_sum is the sum of all values. For [3, +∞) only one record is ≥ 3 so b_sum is 500.

Input data

  a    b          
1.1  100          
2.3  150          
0.1   20          
0.5   80          
3.3  500          
1.6  200
1.1  180

Desired outcome

n  b_sum
0   1230
1   1130
2    650
3    500
4      0

I am sure this would be easy enough using a for loop; however; I would like to avoid this approach and use a vectorized base R or dplyr approach.

like image 346
Alex Trueman Avatar asked Dec 06 '22 19:12

Alex Trueman


1 Answers

Vectorized solution

df <- df[order(df$a), ] # sort by "a" column
ind <- findInterval(0:4, df$a) + 1 
sum(df$b) - cumsum(c(0,  df$b))[ind]
#[1] 1230 1130  650  500    0
like image 105
Khashaa Avatar answered Jan 04 '23 19:01

Khashaa