So let's take the following data.table. It has dates and a column of numbers. I'd like to get the week of each date and then aggregate (sum) of each two weeks.
Date <- as.Date(c("1980-01-01", "1980-01-02", "1981-01-05", "1981-01-05", "1982-01-08", "1982-01-15", "1980-01-16", "1980-01-17",
"1981-01-18", "1981-01-22", "1982-01-24", "1982-01-26"))
Runoff <- c(2, 1, 0.1, 3, 2, 5, 1.5, 0.5, 0.3, 2, 1.5, 4)
DT <- data.table(Date, Runoff)
DT
So from the date, I can easily get the year and week.
DT[,c("Date_YrWeek") := paste(substr(Date,1,4), week(Date), sep="-")][]
What I'm struggling with is aggregating with every two week. I thought that I'd get the first date for each week and filter using those values. Unfortunately, that would be pretty foolish.
DT[,.(min(Date)),by=.(Date_YrWeek)][order(Date)]
The final result would end up being the sum of every two weeks.
weeks sum_value
1 and 2 ...
3 and 4 ...
5 and 6 ...
Anyone have an efficient way to do this with data.table?
1) Define the two week periods as starting from the minimum Date. Then we can get the total Runoff for each such period like this.
DT[, .(sum_value = sum(Runoff)),
keyby = .(Date = 14 * (as.numeric(Date - min(Date)) %/% 14) + min(Date))]
giving the following where the Date column is the date of the first day of the two week period.
Date sum_value
1: 1980-01-01 3.0
2: 1980-01-15 2.0
3: 1980-12-30 3.1
4: 1981-01-13 2.3
5: 1981-12-29 2.0
6: 1982-01-12 6.5
7: 1982-01-26 4.0
2) If you prefer the text shown in the question for the first column then:
DT[, .(sum_value = sum(Runoff)),
keyby = .(two_week = as.numeric(Date - min(Date)) %/% 14)][
, .(weeks = paste(2*two_week + 1, "and", 2*two_week + 2), sum_value)]
giving:
weeks sum_value
1: 1 and 2 3.0
2: 3 and 4 2.0
3: 53 and 54 3.1
4: 55 and 56 2.3
5: 105 and 106 2.0
6: 107 and 108 6.5
7: 109 and 110 4.0
Update: Revised and added (2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With