Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to summarize value not matching the group using dplyr

Tags:

r

dplyr

I want to sum values of rows which belongs to group other than the row's group. For example using this sample data

> df <- data.frame(id=1:5, group=c("A", "A", "B", "B", "A"), val=seq(9, 1, -2))
> df
  id group val
1  1     A   9
2  2     A   7
3  3     B   5
4  4     B   3
5  5     A   1

Summarizing with dplyr by group

> df %>% group_by(group) %>% summarize(sumval = sum(val))
Source: local data frame [2 x 2]

   group sumval
  (fctr)  (dbl)
1      A     17
2      B      8

What I want is the value for rows belonging to group A to use sumval of not group A. i.e. the final result is

  id group val notval
1  1     A   9      8
2  2     A   7      8
3  3     B   5     17
4  4     B   3     17
5  5     A   1      8

Is there a way to do this in dplyr? Preferrably in a single chain?

like image 328
Ricky Avatar asked Nov 30 '22 16:11

Ricky


2 Answers

We can do this with base R

 s1 <- sapply(unique(df$group), function(x) sum(df$val[df$group !=x]))
 s1[with(df, match(group, unique(group)))]
 #[1]  8  8 17 17  8

Or using data.table

library(data.table)
setDT(df)[,notval := sum(df$val[df$group!=group]) ,group]
like image 189
akrun Avatar answered Dec 04 '22 13:12

akrun


@akrun answers are best. But if you want to do in dplyr, this is a round about way.

df <- data.frame(id=1:5, group=c("A", "A", "B", "B", "A"), val=seq(9, 1, -2))



    df %>% mutate(TotalSum = sum(val)) %>% group_by(group) %>%
 mutate(valsumval = TotalSum - sum(val))

Source: local data frame [5 x 5]
Groups: group [2]

         id  group   val TotalSum valsumval
      (int) (fctr) (dbl)    (dbl)     (dbl)
    1     1      A     9       25         8
    2     2      A     7       25         8
    3     3      B     5       25        17
    4     4      B     3       25        17
    5     5      A     1       25         8

This also works even if there are more than two groups.

Also Just this works

df %>% group_by(group) %>% mutate(notval = sum(df$val)- sum(val))
like image 32
Koundy Avatar answered Dec 04 '22 12:12

Koundy