Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregating time series in R

I have the following OHLC data (by 3-minute intervals)

library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
  Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL


>tickmin
2010-09-08 15:02:00        20
2010-09-08 15:04:00        77
2010-09-08 15:08:00        86
2010-09-08 15:11:00         7
2010-09-08 15:14:00        43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"

I am trying to aggregate it using the following:

> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]

> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE

..would appreciate any suggestions on how I can fix the error.

like image 878
itcplpl Avatar asked Nov 28 '22 09:11

itcplpl


1 Answers

I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!

From the error message, the length of your by is not the same length as Vo(tickmin). You have to generate your by to have one value per corresponding value in tickmin, with the day.

As an example here I generate an xts object:

# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n   <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'

which yields:

> head(xts.ts)
                    Volume
2010-06-30 00:00:00     97
2010-06-30 00:20:00     78
2010-06-30 00:40:00     38
2010-06-30 01:00:00     86
2010-06-30 01:20:00     79
2010-06-30 01:40:00     55

To access the dates of xts.ts you use index(xts.ts) which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST".

To round these to the nearest day you can use as.Date:

> as.Date(index(xts.ts))
   [1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
    ....

Solution to your problem

Then to use aggregate you do:

> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)

2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...

Better solution to your problem

The xts package has functions apply.daily, apply.monthly, etc (use ls('package:xts') to see what functions it has -- there may be ones you're interested in).

apply.daily(x,FUN,...) does exactly what you want. See ?apply.daily. To use it you can do:

> apply.daily(xts.ts,sum)

                    Volume
2010-06-30 23:40:00   4005
2010-07-01 23:40:00   4093
2010-07-02 23:40:00   3419
2010-07-03 23:40:00   3737
...

Or if your xts object has other columns like Open, Close etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x))).

Note that the answers are slightly different using apply.daily to the aggregate ... as.Date method. That's because apply.daily goes daily from start(xts.ts) to end(xts.ts) (more or less) whereas aggregate just went by day from midnight to midnight.

Looking at your question, apply.daily seems to match most closely what you want to do (and is provided with xts anyway, so why not use it?)

like image 76
mathematical.coffee Avatar answered Dec 05 '22 13:12

mathematical.coffee