Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

to.minutes using custom endpoints

Tags:

r

time-series

xts

I am using intra-day data that starts at 9:50am and would like to convert it into 20 minute time intervals so the first period would be from 09:50 to 10:09:59 and the second time period would be from 10:10 to 10:29:59 etc. However to.minutes() from the xts package seems to fix it onto the hours and the has time bars at 09:59:59 and 10:19:59 etc...i.e. it is 10 minutes out... i know its probably not a regular request...but is there anyway of doing this so that it has the correct endpoints, i.e. basing it upon the frist timestamp?

and for bonus points...is there a way to do do it based on the final time stamp? (i.e. generateing the period endpoints going backwards from that timestamp?

here is an illustration of my point about it being 10 minutes out (from what I want)

x <- xts(rnorm(24*60*60), as.POSIXct(format(paste(Sys.Date(),'09:50')))-((24*60*60):1))
head(x)
x1 <- to.minutes(x, 20)
head(x1)

I can think of a way to correct this by using splits, cuts, lapplys do.calls and rbinds....but I would basically be re-creating an OHLC object...and feel that it might be inefficient in comparison to existing solutions...

like image 614
h.l.m Avatar asked Oct 05 '12 06:10

h.l.m


2 Answers

Here is a useful trick that should maybe be more prominent in the xts documentation.

Start with an xts object

R> set.seed(42)   ## fix seed
R> X <- xts(cumsum(rnorm(100))+100, order.by=Sys.time()+cumsum(runif(100)))
R> head(X)
                              [,1]
2012-10-05 06:42:20.299761 101.371
2012-10-05 06:42:20.816872 100.806
2012-10-05 06:42:21.668803 101.169
2012-10-05 06:42:22.111599 101.802
2012-10-05 06:42:22.269479 102.207
2012-10-05 06:42:22.711804 102.100

Given this irregular series, we want to subset at regular intervals we impose. Here, I create a two-second interval. Any other would work if it is in the same type as the index, here POSIXct.

R> ind <- seq(start(X) - as.numeric(start(X)-round(start(X))) + 1, 
+             end(X), by="2 secs")
R> head(ind)
[1] "2012-10-05 06:42:21 CDT" "2012-10-05 06:42:23 CDT" 
[3] "2012-10-05 06:42:25 CDT" "2012-10-05 06:42:27 CDT" 
[5] "2012-10-05 06:42:29 CDT" "2012-10-05 06:42:31 CDT"
R> 

The trick now is to merge the regular series with the irregular one, call na.locf() on it to call the last good irregular obs onto the new time grid -- and to then subset at the time grid:

R> na.locf(merge(X, xts(,ind)))[ind]
                           X
2012-10-05 06:42:21 100.8063
2012-10-05 06:42:23 102.1004
2012-10-05 06:42:25 105.4730
2012-10-05 06:42:27 107.2635
2012-10-05 06:42:29 104.9588
2012-10-05 06:42:31 101.7505
2012-10-05 06:42:33 104.6884
2012-10-05 06:42:35 103.6441
2012-10-05 06:42:37 101.6476
2012-10-05 06:42:39  98.6246
2012-10-05 06:42:41  97.9922
2012-10-05 06:42:43  97.7545
2012-10-05 06:42:45 101.0187
2012-10-05 06:42:47  98.0331
2012-10-05 06:42:49 100.7752
2012-10-05 06:42:51 103.0702
2012-10-05 06:42:53 102.6578
2012-10-05 06:42:55 103.1342
2012-10-05 06:42:57 103.4714
2012-10-05 06:42:59 102.3683
2012-10-05 06:43:01 105.0394
2012-10-05 06:43:03 103.9775
R> 

Voila.

like image 191
Dirk Eddelbuettel Avatar answered Nov 13 '22 22:11

Dirk Eddelbuettel


I had a similar challenge recently (splitting FX data by the 5pm day start). Starting with your test data:

library(xts)
set.seed(42)
x <- xts(rnorm(24*60*60), as.POSIXct(format(paste(Sys.Date(),'09:50')))-((24*60*60):1))

Move it back 10 minutes, do the split, then move the split data forward 10 minutes:

offset <- 600
index(x) <- index(x) - offset
x1 <- to.minutes(x, 20)
index(x1) <- index(x1) + offset

(NB. this corrupts x; either work on a copy or also do index(x) <- index(x) + offset afterwards). x1 looks like:

                        x.Open   x.High     x.Low    x.Close
2012-10-06 10:09:59  1.3709584 3.495304 -3.371739  0.4408241
2012-10-06 10:29:59 -0.7465165 3.584659 -2.828475  0.5938161
2012-10-06 10:49:59  1.3275046 3.174520 -3.199558 -0.6273660
...
2012-10-07 09:09:59 -0.83742490 3.103466 -3.251721 -1.093380
2012-10-07 09:29:59 -0.48464537 3.228048 -3.113351 -1.572931
2012-10-07 09:49:59  1.90503697 3.420940 -3.505207  2.832325

The magic number of 600 came because your last tick was 600 seconds from the previous 20 minute boundary. Here is how you calculate it dynamically:

offset <- ( as.integer(last(index(x))) %% 1200 ) + 1

as.integer converts the time of the last tick into secs-since-1970 form. (Use as.numeric if you have milliseconds in your timestamps.) %%1200 rounds down to a 20 minute boundary. Finally, the +1 is because to.minutes treats XX:XX:00 as the start of one bar, not the end of the previous bar.

like image 36
Darren Cook Avatar answered Nov 13 '22 22:11

Darren Cook