I would like to calculate the conditional sum of a data frame column for a set of intervals [n, +∞)
(i.e., ≥ n
) applied to another column. In the example data below the intervals are applied to column a
and the values in column b
are conditionally summed. For [0, +∞)
all of column a
values are ≥ 0
so b_sum
is the sum of all values. For [3, +∞)
only one record is ≥ 3
so b_sum
is 500.
Input data
a b
1.1 100
2.3 150
0.1 20
0.5 80
3.3 500
1.6 200
1.1 180
Desired outcome
n b_sum
0 1230
1 1130
2 650
3 500
4 0
I am sure this would be easy enough using a for
loop; however; I would like to avoid this approach and use a vectorized base R
or dplyr
approach.
Vectorized solution
df <- df[order(df$a), ] # sort by "a" column
ind <- findInterval(0:4, df$a) + 1
sum(df$b) - cumsum(c(0, df$b))[ind]
#[1] 1230 1130 650 500 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With