I'm trying to calculate the sum of values (in XYZmin
) that occur on the same date.
My data looks like this,
bar <- structure(list(date = structure(c(15622, 15622, 15622, 15628,
15632, 15635, 15639, 15639, 15639, 15639, 15639, 15642, 15646,
15646, 15650, 15650, 15650, 15657, 15660, 15660, 15674, 15681,
15691, 15695, 15709, 15716, 15723, 15730, 15737, 15737, 15737,
15737, 15737, 15737, 15740, 15743, 15743, 15743, 15744, 15744,
15744, 15744, 15746, 15751, 15755, 15758), class = "Date"), XYZmin = c(-20,
-15, -10, -70, -60, -60, -95, -10, -10, -40, -25, -25, -20, -10,
-3, -5, -25, -5, -70, -5, -30, -30, -25, 60, 60, 60, 60, 60,
-10, -10, -30, -30, -10, -10, -10, -60, -30, -10, 75, -10, -10,
-10, 60, 60, -15, 60)), .Names = c("date", "XYZmin"), class = "data.frame", row.names = c(NA,
-46L))
head(bar)
date XYZmin
1 2012-10-09 -20
2 2012-10-09 -15
3 2012-10-09 -10
4 2012-10-15 -70
5 2012-10-19 -60
6 2012-10-22 -60
What I'm struggling to accomplish is to create a new variables XYZtot
where, within datas that occur more then once, and sum the 1st and 2nd value at the 2nd data, and sum the 1st, 2nd, and 3rd value at the 3rd data. Here is a snipped of what I am aiming at.
head(new_bar_with_XYZtot)
date XYZmin XYZtot
1 2012-10-09 -20 -20
2 2012-10-09 -15 -35
3 2012-10-09 -10 -40
4 2012-10-15 -70 -70
5 2012-10-19 -60 -60
6 2012-10-22 -60 -60
microbenchmark
testalexwhan <- function(bar,date,XYZmin) ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin))
Arun <- function(bar,date,XYZmin) within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})
agstudy <- function(bar,date,XYZmin) transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))
# install.packages("data.table", dependencies = TRUE)
library(data.table)
mnel <- function(bar,date,XYZmin) bar <- data.table(bar); bar[, XYZmin.sum := cumsum(XYZmin), by = date]
# install.packages("microbenchmark", dependencies = TRUE)
require(microbenchmark)
# run test
res <- microbenchmark(alexwhan(bar,date,XYZmin), Arun(bar,date,XYZmin), agstudy(bar,date,XYZmin), mnel(bar,date,XYZmin), times = 666)
## Print results:
print(res)
the numbers,
Unit: microseconds
expr min lq median uq max neval
alexwhan(bar, date, XYZmin) 14484.077 15056.613 15237.760 15945.482 72650.126 666
Arun(bar, date, XYZmin) 963.632 1018.311 1070.759 1138.655 4988.226 666
agstudy(bar, date, XYZmin) 1967.292 2021.115 2078.261 2158.689 9240.500 666
mnel(bar, date, XYZmin) 251.312 270.295 282.821 325.040 6540.367 666
### Plot results:
boxplot(res)
If you are going for speend, I'll put in a data.table
solution
library(data.table)
bar <- data.table(bar)
# assigning within bar
bar[, XYZmin.sum := cumsum(XYZmin), by = date]
This will scale for large data!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With