Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sum within (date)

Tags:

r

base

sum

I'm trying to calculate the sum of values (in XYZmin) that occur on the same date.

My data looks like this,

bar <- structure(list(date = structure(c(15622, 15622, 15622, 15628, 
15632, 15635, 15639, 15639, 15639, 15639, 15639, 15642, 15646, 
15646, 15650, 15650, 15650, 15657, 15660, 15660, 15674, 15681, 
15691, 15695, 15709, 15716, 15723, 15730, 15737, 15737, 15737, 
15737, 15737, 15737, 15740, 15743, 15743, 15743, 15744, 15744, 
15744, 15744, 15746, 15751, 15755, 15758), class = "Date"), XYZmin = c(-20, 
-15, -10, -70, -60, -60, -95, -10, -10, -40, -25, -25, -20, -10, 
-3, -5, -25, -5, -70, -5, -30, -30, -25, 60, 60, 60, 60, 60, 
-10, -10, -30, -30, -10, -10, -10, -60, -30, -10, 75, -10, -10, 
-10, 60, 60, -15, 60)), .Names = c("date", "XYZmin"), class = "data.frame", row.names = c(NA, 
-46L))

head(bar)   
        date XYZmin
1 2012-10-09    -20
2 2012-10-09    -15
3 2012-10-09    -10
4 2012-10-15    -70
5 2012-10-19    -60
6 2012-10-22    -60

What I'm struggling to accomplish is to create a new variables XYZtot where, within datas that occur more then once, and sum the 1st and 2nd value at the 2nd data, and sum the 1st, 2nd, and 3rd value at the 3rd data. Here is a snipped of what I am aiming at.

head(new_bar_with_XYZtot) 

        date XYZmin XYZtot
1 2012-10-09    -20    -20
2 2012-10-09    -15    -35
3 2012-10-09    -10    -40
4 2012-10-15    -70    -70
5 2012-10-19    -60    -60
6 2012-10-22    -60    -60

updated with the big microbenchmark test

alexwhan <- function(bar,date,XYZmin) ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin))

Arun <- function(bar,date,XYZmin) within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})

agstudy <- function(bar,date,XYZmin) transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))

# install.packages("data.table", dependencies = TRUE)
library(data.table)
mnel <- function(bar,date,XYZmin)  bar <- data.table(bar); bar[, XYZmin.sum := cumsum(XYZmin), by = date]

# install.packages("microbenchmark", dependencies = TRUE)
require(microbenchmark)

# run test
res <- microbenchmark(alexwhan(bar,date,XYZmin), Arun(bar,date,XYZmin), agstudy(bar,date,XYZmin), mnel(bar,date,XYZmin), times = 666)


## Print results:
print(res)

the numbers,

Unit: microseconds
                        expr       min        lq    median        uq       max neval
 alexwhan(bar, date, XYZmin) 14484.077 15056.613 15237.760 15945.482 72650.126   666
     Arun(bar, date, XYZmin)   963.632  1018.311  1070.759  1138.655  4988.226   666
  agstudy(bar, date, XYZmin)  1967.292  2021.115  2078.261  2158.689  9240.500   666
     mnel(bar, date, XYZmin)   251.312   270.295   282.821   325.040  6540.367   666


### Plot results:
boxplot(res)

Plot results

like image 575
Eric Fail Avatar asked Dec 16 '22 13:12

Eric Fail


1 Answers

If you are going for speend, I'll put in a data.table solution

library(data.table)
bar <- data.table(bar)

# assigning within bar
bar[, XYZmin.sum := cumsum(XYZmin), by = date]

This will scale for large data!

like image 108
mnel Avatar answered Dec 18 '22 02:12

mnel