I'm very very new to R and am looking at ways of recreating an Excel VBA macro and Excel worksheet functions such as SUMIFS. SUMIFS sums a column if the row has entries matching multiple conditions on its other columns.
I have the below data frame and I want to compute a new column. The new column is the sum of Sample
for all rows that overlap with the Start Date
and EndDate
range. For example on line 1
it would be 697
(the sum of the first 3 lines
). The criteria for the sum specifically: include Sample
if EndDate >= StartDate[i] & StartDate <=EndDate[i]
StartDate EndDate Sample *SUMIFS example*
10/01/14 24/01/14 139 *697*
12/01/14 26/01/14 136
19/01/14 02/02/14 422
25/01/14 08/02/14 762
29/01/14 12/02/14 899
05/02/14 19/02/14 850
07/02/14 21/02/14 602
09/02/14 23/02/14 180
18/02/14 04/03/14 866
Any comments or pointers would be greatly appreciated.
You could do this with a loop or with a Cartesian merge. I don't know of any built in functions to do exactly this.
library(dplyr)
x = structure(list(StartDate = structure(c(1389312000, 1389484800,
1390089600, 1390608000, 1390953600, 1391558400, 1391731200, 1391904000,
1392681600), tzone = "UTC", class = c("POSIXct", "POSIXt")),
EndDate = structure(c(1390521600, 1390694400, 1391299200,
1391817600, 1392163200, 1392768000, 1392940800, 1393113600,
1393891200), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), Sample = c(139L, 136L, 422L, 762L, 899L, 850L, 602L,
180L, 866L)), .Names = c("StartDate", "EndDate", "Sample"
), row.names = c(NA, -9L), class = "data.frame")
x2 = x
names(x2)=c('StartDate2','EndDate2','Sample2')
x3 = merge(x,x2,allow.cartesian =T)
x4 = summarise(group_by(x3,StartDate,EndDate),
sumifs=sum(Sample2[EndDate2 >= StartDate & StartDate2 <= EndDate]))
x_sumifs = merge(x,x4,by=c('StartDate','EndDate'))
This is what the output looks like.
> x_sumifs
StartDate EndDate Sample sumifs
1 2014-01-10 2014-01-24 139 697
2 2014-01-12 2014-01-26 136 1459
3 2014-01-19 2014-02-02 422 2358
4 2014-01-25 2014-02-08 762 3671
5 2014-01-29 2014-02-12 899 3715
6 2014-02-05 2014-02-19 850 4159
7 2014-02-07 2014-02-21 602 4159
8 2014-02-09 2014-02-23 180 3397
9 2014-02-18 2014-03-04 866 2498
You could use lapply/sapply
from base R
to do this. x
from @cameron.bracken's post.
x$sumifs <- sapply(seq_len(nrow(x)), function(i) with(x,
sum(Sample[EndDate >= StartDate[i] & StartDate <= EndDate[i]])))
x
# StartDate EndDate Sample sumifs
#1 2014-01-10 2014-01-24 139 697
#2 2014-01-12 2014-01-26 136 1459
#3 2014-01-19 2014-02-02 422 2358
#4 2014-01-25 2014-02-08 762 3671
#5 2014-01-29 2014-02-12 899 3715
#6 2014-02-05 2014-02-19 850 4159
#7 2014-02-07 2014-02-21 602 4159
#8 2014-02-09 2014-02-23 180 3397
#9 2014-02-18 2014-03-04 866 2498
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With