For instance, let's say you have ~10 years of daily 1 min data for the volume of instrument x as follows (in xts
format) from 9:30am to 4:30pm :
Date.Time Volume
2001-01-01 09:30:00 1200
2001-01-01 09:31:00 1110
2001-01-01 09:32:00 1303
All the way through to:
2010-12-20 16:28:00 3200
2010-12-20 16:29:00 4210
2010-12-20 16:30:00 8303
I would like to:
How should I best go about:
I've had a good poke around with aggregate
, sapply
, period.apply
functions etc, but just cannot seem to "bin" the data correctly.
It's easy enough to solve this with a loop, but very slow. I'd prefer to avoid a programmatic solution and use a function that takes advantage of C++ architecture (ie xts
based solution)
Can anyone offer some advice / a solution?
Thanks so much in advance.
eXtensible Time Series (xts) is a powerful package that provides an extensible time series class, enabling uniform handling of many R time series classes by extending zoo.
xts , a constructor or a subclass that inherits behavior from parents. xts (as a subclass) extends the popular zoo class (as a parent). Most zoo methods work for xts . xts is a matrix objects; subsets always preserve the matrix form.
First lets create some test data:
library(xts) # also pulls in zoo
library(timeDate)
library(chron) # includes times class
# test data
x <- xts(1:3, timeDate(c("2001-01-01 09:30:00", "2001-01-01 09:31:00",
"2001-01-02 09:30:00")))
1) aggregate.zoo. Now try converting it to times
class and aggregating using this one-liner:
aggregate(as.zoo(x), times(format(time(x), "%H:%M:%S")), mean)
1a) aggregate.zoo (variation). or this variation which converts the shorter aggregate series to times
to avoid having to do it on the longer original series:
ag <- aggregate(as.zoo(x), format(time(x), "%H:%M:%S"), mean)
zoo(coredata(ag), times(time(ag)))
2) tapply. An alternative would be tapply
which is likely faster:
ta <- tapply(coredata(x), format(time(x), "%H:%M:%S"), mean)
zoo(unname(ta), times(names(ta)))
EDIT: simplified (1) and added (1a) and (2)
Here is a solution with ddply
,
but you can probably also use sqldf
, tapply
, aggregate
, by
, etc.
# Sample data
minutes <- 10 * 60
days <- 250 * 10
d <- seq.POSIXt(
ISOdatetime( 2011,01,01,09,00,00, "UTC" ),
by="1 min", length=minutes
)
d <- outer( d, (1:days) * 24*3600, `+` )
d <- sort(d)
library(xts)
d <- xts( round(100*rlnorm(length(d))), d )
# Aggregate
library(plyr)
d <- data.frame(
minute=format(index(d), "%H:%M"),
value=coredata(d)
)
d <- ddply(
d, "minute",
summarize,
value=mean(value, na.rm=TRUE)
)
# Convert to zoo or xts
zoo(x=d$value, order.by=d$minute) # The index does not have to be a date or time
xts(x=d$value, order.by=as.POSIXct(sprintf("2012-01-01 %s:00",d$minute), "%Y-%m-%d %H:%M:%S") )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With