For instance, let's say you have ~10 years of daily 1 min data for the volume of instrument x as follows (in <code>xts</code> format) from 9:30am to 4:30pm : <pre class="prettyprint"><code> Date.Time Volume 2001-01-01 09:30:00 1200 2001-01-01 09:31:00 1110 2001-01-01 09:32:00 1303 </code></pre> All the way through to: <pre class="prettyprint"><code> 2010-12-20 16:28:00 3200 2010-12-20 16:29:00 4210 2010-12-20 16:30:00 8303 </code></pre> I would like to: <ul> <li>Get the average volume at each minute for the entire series (ie average volume over all 10 years at 9:30, 9:31, 9:32...16:28, 16:29, 16:30)</li> </ul> How should I best go about: <ul> <li>Aggregating the data into one minute buckets</li> <li>Getting the average of those buckets</li> <li>Reconstituting those "average" buckets back to a single xts/zoo time series?</li> </ul> I've had a good poke around with <code>aggregate</code>, <code>sapply</code>, <code>period.apply</code> functions etc, but just cannot seem to "bin" the data correctly. It's easy enough to solve this with a loop, but very slow. I'd prefer to avoid a programmatic solution and use a function that takes advantage of C++ architecture (ie <code>xts</code> based solution) Can anyone offer some advice / a solution? Thanks so much in advance.

First lets create some test data: <pre class="prettyprint"><code>library(xts) # also pulls in zoo library(timeDate) library(chron) # includes times class # test data x <- xts(1:3, timeDate(c("2001-01-01 09:30:00", "2001-01-01 09:31:00", "2001-01-02 09:30:00"))) </code></pre> 1) aggregate.zoo. Now try converting it to <code>times</code> class and aggregating using this one-liner: <pre class="prettyprint"><code>aggregate(as.zoo(x), times(format(time(x), "%H:%M:%S")), mean) </code></pre> 1a) aggregate.zoo (variation). or this variation which converts the shorter aggregate series to <code>times</code> to avoid having to do it on the longer original series: <pre class="prettyprint"><code>ag <- aggregate(as.zoo(x), format(time(x), "%H:%M:%S"), mean) zoo(coredata(ag), times(time(ag))) </code></pre> 2) tapply. An alternative would be <code>tapply</code> which is likely faster: <pre class="prettyprint"><code>ta <- tapply(coredata(x), format(time(x), "%H:%M:%S"), mean) zoo(unname(ta), times(names(ta))) </code></pre> EDIT: simplified (1) and added (1a) and (2)

Here is a solution with <code>ddply</code>, but you can probably also use <code>sqldf</code>, <code>tapply</code>, <code>aggregate</code>, <code>by</code>, etc. <pre class="prettyprint"><code># Sample data minutes <- 10 * 60 days <- 250 * 10 d <- seq.POSIXt( ISOdatetime( 2011,01,01,09,00,00, "UTC" ), by="1 min", length=minutes ) d <- outer( d, (1:days) * 24*3600, `+` ) d <- sort(d) library(xts) d <- xts( round(100*rlnorm(length(d))), d ) # Aggregate library(plyr) d <- data.frame( minute=format(index(d), "%H:%M"), value=coredata(d) ) d <- ddply( d, "minute", summarize, value=mean(value, na.rm=TRUE) ) # Convert to zoo or xts zoo(x=d$value, order.by=d$minute) # The index does not have to be a date or time xts(x=d$value, order.by=as.POSIXct(sprintf("2012-01-01 %s:00",d$minute), "%Y-%m-%d %H:%M:%S") ) </code></pre>

What is the best method to bin intraday volume figures from a stock price timeseries using XTS / ZOO etc in R?

Tags:

r

time-series

zoo

xts

quantitative-finance

For instance, let's say you have ~10 years of daily 1 min data for the volume of instrument x as follows (in xts format) from 9:30am to 4:30pm :

    Date.Time               Volume        
    2001-01-01 09:30:00     1200
    2001-01-01 09:31:00     1110
    2001-01-01 09:32:00     1303

All the way through to:

    2010-12-20 16:28:00     3200
    2010-12-20 16:29:00     4210
    2010-12-20 16:30:00     8303

I would like to:

Get the average volume at each minute for the entire series (ie average volume over all 10 years at 9:30, 9:31, 9:32...16:28, 16:29, 16:30)

How should I best go about:

Aggregating the data into one minute buckets
Getting the average of those buckets
Reconstituting those "average" buckets back to a single xts/zoo time series?

I've had a good poke around with aggregate, sapply, period.apply functions etc, but just cannot seem to "bin" the data correctly.

It's easy enough to solve this with a loop, but very slow. I'd prefer to avoid a programmatic solution and use a function that takes advantage of C++ architecture (ie xts based solution)

Can anyone offer some advice / a solution?

Thanks so much in advance.

435

asked Feb 24 '12 06:02

n.e.w

2 Answers

First lets create some test data:

library(xts) # also pulls in zoo
library(timeDate)
library(chron) # includes times class

# test data
x <- xts(1:3, timeDate(c("2001-01-01 09:30:00", "2001-01-01 09:31:00", 
    "2001-01-02 09:30:00")))

1) aggregate.zoo. Now try converting it to times class and aggregating using this one-liner:

aggregate(as.zoo(x), times(format(time(x), "%H:%M:%S")), mean)

1a) aggregate.zoo (variation). or this variation which converts the shorter aggregate series to times to avoid having to do it on the longer original series:

ag <- aggregate(as.zoo(x), format(time(x), "%H:%M:%S"), mean)
zoo(coredata(ag), times(time(ag)))

2) tapply. An alternative would be tapply which is likely faster:

ta <- tapply(coredata(x), format(time(x), "%H:%M:%S"), mean)
zoo(unname(ta), times(names(ta)))

EDIT: simplified (1) and added (1a) and (2)

answered Nov 04 '22 15:11

G. Grothendieck

Here is a solution with ddply, but you can probably also use sqldf, tapply, aggregate, by, etc.

# Sample data
minutes <- 10 * 60
days <- 250 * 10
d <- seq.POSIXt( 
  ISOdatetime( 2011,01,01,09,00,00, "UTC" ), 
  by="1 min", length=minutes 
)
d <- outer( d, (1:days) * 24*3600, `+` )
d <- sort(d)
library(xts)
d <- xts( round(100*rlnorm(length(d))), d )

# Aggregate
library(plyr)
d <- data.frame( 
  minute=format(index(d), "%H:%M"), 
  value=coredata(d) 
)
d <- ddply( 
  d, "minute", 
  summarize, 
  value=mean(value, na.rm=TRUE) 
)

# Convert to zoo or xts
zoo(x=d$value, order.by=d$minute) # The index does not have to be a date or time
xts(x=d$value, order.by=as.POSIXct(sprintf("2012-01-01 %s:00",d$minute), "%Y-%m-%d %H:%M:%S") )

answered Nov 04 '22 17:11

Vincent Zoonekynd

Related questions
                            
                                Unique Combination Frequency
                            
                                Plotting in R; cannot be coerced to double error
                            
                                How to write raw type / bytes to stdout?
                            
                                Compute the minimum of a pair of vectors
                            
                                How was the cor() function sped up?
                            
                                S3 Class Names: What's Allowed?
                            
                                S4 constructors and prototypes
                            
                                How do I fill a geom_area() plot using ggplot?
                            
                                Find all paths between two vertices (nodes)
                            
                                Is there an interactive output device to view 3D graphs in R?
                            
                                Using StatET with Eclipse in Win64: "no session of R is active in the current workbench window"
                            
                                How to use faceting with geom_polygon to generate a grid of maps
                            
                                can one offset jitter points in ggplot boxplot
                            
                                Big Merge / Memory management
                            
                                Documenting setter functions with roxygen
                            
                                Cumulative frequency by factor
                            
                                How to count the number of concurrent users using time interval data?
                            
                                Remove rows from data: overlapping time intervals?
                            
                                legend venn diagram in venneuler
                            
                                How to create ID column in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With