Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum values across different rows and summarise as one row (R)

Tags:

dataframe

r

I have employee payments data that comes out as one row = one payment record. The variables describe the name, what payment it was and the value.

My end goal is to have a data frame in which each employee = one row with the different types of payments summed up and each payment type has its own variable.

Please see example:

data <- data.frame("name" = c("John", "John", "John", "Marie", "Marie", "Alex"),
               "payment.reason" = c("bonus", "bonus", "commission", "commission", "commission", "discretionary bonus"),
               "value" = c(1000, 5000, 2500, 1500, 500, 2500))

which looks like this:

   name      payment.reason value
1  John               bonus  1000
2  John               bonus  5000
3  John          commission  2500
4 Marie          commission  1500
5 Marie          commission   500
6  Alex discretionary bonus  2500

and this is the end result I am after:

goal
   name bonus commission discretionary.bonus
1  John  6000       2500                   0
2 Marie     0       2000                   0
3  Alex     0          0                2500

I know I'll need to spread the data to push the payment.reason values into columns, but I am struggling to figure out how to sum each individual payment type value for each person and have the data come out grouped by each person.

Thank you in advance!

like image 719
vsx99 Avatar asked Nov 28 '22 21:11

vsx99


2 Answers

We can do all of this with pivot_wider in tidyr:

library(tidyr)

pivot_wider(data, name, names_from = payment.reason, values_from = value, values_fn = list(value = sum))
#> # A tibble: 3 x 4
#>   name  bonus commission `discretionary bonus`
#>   <fct> <dbl>      <dbl>                 <dbl>
#> 1 John   6000       2500                    NA
#> 2 Marie    NA       2000                    NA
#> 3 Alex     NA         NA                  2500

Created on 2019-12-23 by the reprex package (v0.3.0)

Note (as in @AlexB's answer) that you can also add values_fill = list(value = 0) if you need explicit 0s instead of NA.

like image 92
MSR Avatar answered Dec 05 '22 12:12

MSR


We can use dcast from data.table and make use of the fun.aggregate

library(data.table)
dcast(setDT(data), name ~ payment.reason, value.var = 'value', sum)
#    name bonus commission discretionary bonus
#1:  Alex     0          0                2500
#2:  John  6000       2500                   0
#3: Marie     0       2000                   0

Or xtabs from base R

xtabs(value ~ name + payment.reason, data)
#    payment.reason
#name    bonus commission discretionary bonus
#  Alex      0          0                2500
#  John   6000       2500                   0
#  Marie     0       2000                   0
like image 24
akrun Avatar answered Dec 05 '22 11:12

akrun