Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative total by group

Tags:

r

data.table

For the following dataset:

d = data.frame(date = as.Date(as.Date('2015-01-01'):as.Date('2015-04-10'), origin = "1970-01-01"),
               group = rep(c('A','B','C','D'), 25), value = sample(1:100))
head(d)
         date group value
1: 2015-01-01     A     4
2: 2015-01-02     B    32
3: 2015-01-03     C    46
4: 2015-01-04     D    40
5: 2015-01-05     A    93
6: 2015-01-06     B    10

.. can anyone advise a more elegant way to calculate a cumulative total of values by group than this data.table) method?

library(data.table)
setDT(d)
d.cast = dcast.data.table(d, group ~ date, value.var = 'value', fun.aggregate = sum)
c.sum = d.cast[, as.list(cumsum(unlist(.SD))), by = group]

.. which is pretty clunky and yields a flat matrix that needs dplyr::gather or reshape2::melt to reformat.

Surely R can do better than this??

like image 439
geotheory Avatar asked May 22 '15 14:05

geotheory


People also ask

How do you calculate cumulative total?

Cumulative means "how much so far". Think of the word "accumulate" which means to gather together. To have cumulative totals, just add up the values as you go.

What is the difference between running total and cumulative total?

A running total is the cumulative sum of a value and all previous values in the column. For example, imagine you are in sales and storing information about the number of items sold on a particular day. You might want to calculate a running total, the total number of items sold up to a specific date.

What is meant by cumulative sum?

Cumulative sums, or running totals, are used to display the total sum of data as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.


2 Answers

If you just want cumulative sums per group, then you can do

transform(d, new=ave(value,group,FUN=cumsum))

with base R.

like image 163
MrFlick Avatar answered Sep 22 '22 05:09

MrFlick


This should work

library(dplyr)
d %>% 
  group_by(group) %>% 
  arrange(date) %>% 
  mutate(Total = cumsum(value))
like image 31
Akhil Nair Avatar answered Sep 20 '22 05:09

Akhil Nair