Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.Table: Aggregate by every two weeks

Tags:

r

data.table

So let's take the following data.table. It has dates and a column of numbers. I'd like to get the week of each date and then aggregate (sum) of each two weeks.

Date <- as.Date(c("1980-01-01", "1980-01-02", "1981-01-05", "1981-01-05", "1982-01-08", "1982-01-15", "1980-01-16", "1980-01-17", 
                  "1981-01-18", "1981-01-22", "1982-01-24", "1982-01-26"))
Runoff <- c(2, 1, 0.1, 3, 2, 5, 1.5, 0.5, 0.3, 2, 1.5, 4)
DT <- data.table(Date, Runoff)
DT

So from the date, I can easily get the year and week.

DT[,c("Date_YrWeek") := paste(substr(Date,1,4), week(Date), sep="-")][]

What I'm struggling with is aggregating with every two week. I thought that I'd get the first date for each week and filter using those values. Unfortunately, that would be pretty foolish.

DT[,.(min(Date)),by=.(Date_YrWeek)][order(Date)]

The final result would end up being the sum of every two weeks.

weeks    sum_value
1 and 2  ...
3 and 4  ...
5 and 6  ...

Anyone have an efficient way to do this with data.table?

like image 858
ATMA Avatar asked Sep 19 '25 18:09

ATMA


1 Answers

1) Define the two week periods as starting from the minimum Date. Then we can get the total Runoff for each such period like this.

DT[, .(sum_value = sum(Runoff)), 
     keyby = .(Date = 14 * (as.numeric(Date - min(Date)) %/% 14) + min(Date))]

giving the following where the Date column is the date of the first day of the two week period.

         Date sum_value
1: 1980-01-01       3.0
2: 1980-01-15       2.0
3: 1980-12-30       3.1
4: 1981-01-13       2.3
5: 1981-12-29       2.0
6: 1982-01-12       6.5
7: 1982-01-26       4.0

2) If you prefer the text shown in the question for the first column then:

DT[, .(sum_value = sum(Runoff)), 
    keyby = .(two_week = as.numeric(Date - min(Date)) %/% 14)][
    , .(weeks = paste(2*two_week + 1, "and", 2*two_week + 2), sum_value)]

giving:

         weeks sum_value
1:     1 and 2       3.0
2:     3 and 4       2.0
3:   53 and 54       3.1
4:   55 and 56       2.3
5: 105 and 106       2.0
6: 107 and 108       6.5
7: 109 and 110       4.0

Update: Revised and added (2).

like image 123
G. Grothendieck Avatar answered Sep 22 '25 09:09

G. Grothendieck