Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding for-loops when aggregating timeseries

I get why vectorised functions are better than for-loops.

But there are some problems where I can't see the vectorised functional programming solution. One of those is summing monthly data to get quarterly data. Any suggestions to replace this code ...

month <- 1:100
A422072L <- c(rep(NA, 4), rnorm(96, 100, 5) ) + 2 * month
A422070J <- c(NA, NA, rnorm(96, 100, 5), NA, NA) + 2 * month
Au.approvals <- data.frame(month=month, A422072L=A422072L, A422070J=A422070J)

Au.approvals$trend.sum.A422072L.qtr <- NA
Au.approvals$sa.sum.A422070J.qtr <- NA
for(i in seq_len(nrow(Au.approvals)))
{
    if(i < 3) next
    if(all(!is.na(Au.approvals$A422072L[(i-2):i])))
        Au.approvals$trend.sum.A422072L.qtr[i] <- sum(Au.approvals$A422072L[(i-2):i])
    if(all(!is.na(Au.approvals$A422070J[(i-2):i])))
        Au.approvals$sa.sum.A422070J.qtr[i]    <- sum(Au.approvals$A422070J[(i-2):i])
}

print(Au.approvals)

Now with enough data to run as an example.

like image 728
Mark Graph Avatar asked Feb 28 '26 14:02

Mark Graph


2 Answers

Let's create some bogus timeseries:

time_dat = data.frame(t = 1:100, value = runif(100))

To get a rolling sum, please take a look at rollapply from the zoo package:

require(zoo)
time_dat = transform(time_dat, 
                     roll_value = rollapply(value, 10, sum, fill = TRUE))

here I assume that the coarser resolution (quarterly) is 10 times coarser than the finer resolution.


Original answer for a non-rolling mean:

I like to use the functions from the plyr package, but ave, aggregate, and data.table are also good options. For large datasets, data.table is veeery fast. But to get back to some plyr magic:

First create an additional column which specifies the more coarse time frequency, i.e. which quarter is your observation in:

time_dat[["coarse_t"]] = rep(1:10, each = 10)
> head(time_dat)
  t     value coarse_t
1 1 0.9045097        1
2 2 0.4174182        1
3 3 0.5638139        1
4 4 0.8228698        1
5 5 0.7059027        1
6 6 0.5285386        1

Now we can aggregate time_dat for the coarser time frequency:

time_dat_coarse = ddply(time_dat, .(coarse_t), summarise, sum_value = sum(value))
> time_dat_coarse
   coarse_t sum_value
1         1  6.097348
2         2  4.834720
3         3  3.988809
4         4  4.170656
5         5  4.538269
6         6  6.198716
7         7  4.399282
8         8  5.507384
9         9  6.089072
10       10  4.663287

like image 87
Paul Hiemstra Avatar answered Mar 02 '26 07:03

Paul Hiemstra


Paul's answer was great, but I just wanted to add that the chron package has many excellent operations for date/time classification which can be paired with plyr for aggregation

library("chron") 
# chron uses chron-specific object representation. 
# If a different representation is needed, a conversion is necessary
# eg. if a$date is a chron date object, I would us as.POSIXct(a$date) to get a POSIXct representation

# create chron date objects and values
a<-data.frame(date=as.chron(Sys.Date() + 1:1000), value = 1:100*runif(100,0,1))

# cuts dates into 15 intervals
a$interval1<-cut(a$date,15)
# cuts dates into 10 number of intervals using a label you define
a$interval2<-cut(a$date,10,paste("group",1:10))
# cuts dates into weeks
a$weeks<-cut(a$date,"weeks",start.on.monday=FALSE)
# cuts dates into months
a$months<-cut(a$date,"months")
# cuts dates into years
a$years<-cut(a$date,"years")
# classifies day based on day of week
a$day_of_week<-day.of.week(a$date)

# creating a chron time object
b<-data.frame(day_time=as.chron(Sys.time()+1:1000*100), value = 1:100*runif(100,0,1))
# cuts times into days - note: uses first time period as the start
b$day<-cut(b$day_time,"days")
# truncates time to 5 minute interval
b$min_5<-trunc(b$day_time, "00:05:00")
# truncates time to 1 hour intervals
b$hour1<-trunc(b$day_time, "01:00:00")
# truncates datetime to 1 hour and 2 second intervals
b$days_3<-trunc(b$day_time, "01:00:02")

I use chron a lot because it makes time aggregations much easier.

For additional awesomeness, the zoo and xts packages have many more functions which are great for various aggregations past the day level of detail. Their documentation is huge and it may hard to find what you want, but pretty much everything you want is there. Some highlights:

library("zoo")
library("xts")
?rollapply
?rollsum
?rollmean
?rollmedian
?rollmax
?yearmon
?yearqtr
?apply.daily
?apply.weekly
?apply.monthly
?apply.quarterly
?apply.yearly
?to.minutes
?to.minutes3
?to.minutes5
?to.minutes10
?to.minutes15
?to.minutes30
?to.hourly
?to.daily
?to.weekly
?to.monthly
?to.quarterly
?to.yearly
?to.period
like image 25
TheDarkSaint Avatar answered Mar 02 '26 07:03

TheDarkSaint



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!