I have a relatively large dataframe (~2,000,000 rows) where, for each row, I need to, within that observation's group, calculate the sum of every value greater than or equal to the current row's value.
Here is a sample data frame:
sample_df = data.frame(
group_id = c(1,1,1,1,2,2,2,2),
value = c(10,12,14,12,8,8,21,10)
)
I currently have a very slow solution to do this using a loop and some filtering, however, having a faster solution would be much preferred. I've been trying to use dplyr however, I can not figure out how I can get the sum of the other observations after the data are grouped.
With the above toy example, here would be the desired output:
desired_output = data.frame(
group_id = c(1,1,1,1,2,2,2,2),
value = c(10,12,14,12,8,8,21,10),
output = c(38,26,0,26,39,39,0,21)
)
Looking around for solutions to this that have already been posted, I haven't seen a clear answer which explains how one can compare each observation in a group to the other observations, filtered on some criteria, in that group. I'd prefer a dplyr-based solution but if there are efficient base-R or data.table solutions than I'd be equally grateful!
This is a simple non-equi join problem:
library(data.table)
dt = as.data.table(sample_df)
dt[dt, on = .(group_id, value >= value), by = .EACHI,
.(output = sum(x.value) - i.value)]
# group_id value output
#1: 1 10 38
#2: 1 12 26
#3: 1 14 0
#4: 1 12 26
#5: 2 8 39
#6: 2 8 39
#7: 2 21 0
#8: 2 10 21
Using tidyverse
. The trick is to use map_dbl
to loop on every value
.
library(tidyverse)
sample_df %>%
group_by(group_id) %>%
mutate(output= map_dbl(value,~sum(value[value>=.x]))-value) %>%
ungroup
# A tibble: 8 x 3
group_id value output
<dbl> <dbl> <dbl>
1 1 10 38
2 1 12 26
3 1 14 0
4 1 12 26
5 2 8 39
6 2 8 39
7 2 21 0
8 2 10 21
value
in the mutate line is your value
'subcolumn' (the group), while .x
is its element you're looping on.
a base solution
within(sample_df,output <- unlist(tapply(
value,group_id,function(x) sapply(x,function(y) sum(x[x>=y])-y))))
# group_id value output
# 1 1 10 38
# 2 1 12 26
# 3 1 14 0
# 4 1 12 26
# 5 2 8 39
# 6 2 8 39
# 7 2 21 0
# 8 2 10 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With