I have an irregular time series of events (posts) using xts
, and I want to calculate the number of events that occur over a rolling weekly window (or biweekly, or 3 day, etc). The data looks like this:
postid
2010-08-04 22:28:07 867
2010-08-04 23:31:12 891
2010-08-04 23:58:05 901
2010-08-05 08:35:50 991
2010-08-05 13:28:02 1085
2010-08-05 14:14:47 1114
2010-08-05 14:21:46 1117
2010-08-05 15:46:24 1151
2010-08-05 16:25:29 1174
2010-08-05 23:19:29 1268
2010-08-06 12:15:42 1384
2010-08-06 15:22:06 1403
2010-08-07 10:25:49 1550
2010-08-07 18:58:16 1596
2010-08-07 21:15:44 1608
which should produce something like
nposts
2010-08-05 00:00:00 10
2010-08-06 00:00:00 9
2010-08-07 00:00:00 5
for a 2-day window. I have looked into rollapply
, apply.rolling
from PerformanceAnalytics
, etc, and they all assume regular time series data. I tried changing all of the times to just the day the the post occurred and using something like ddply
to group on each day, which gets me close. However, a user might not post every day, so the time series will still be irregular. I could fill in the gaps with 0s, but that might inflate my data a lot and it's already quite large.
What should I do?
Rolling-window analysis of a time-series model assesses: The stability of the model over time. A common time-series model assumption is that the coefficients are constant with respect to time. Checking for instability amounts to examining whether the coefficients are time-invariant.
You'll typically use rolling calculations when you work with time-series data. Again, a window is a subset of rows that you perform a window calculation on.
In the rolling window backtesting methodology, researchers use a rolling window (or walk-forward) framework, fit/calibrate factors or trade signals based on the rolling window, rebalance the portfolio periodically, and then track the performance over time.
ROLLING WINDOW METHOD Perhaps the most obvious approach is to divide the time horizon into equal non-overlapping windows—and to use the observations in each window to construct an aggregated observation. Unfortunately, this approach may fail to provide sufficient number of observations for a reliable estimate.
Here's a solution using xts:
x <- structure(c(867L, 891L, 901L, 991L, 1085L, 1114L, 1117L, 1151L,
1174L, 1268L, 1384L, 1403L, 1550L, 1596L, 1608L), .Dim = c(15L, 1L),
index = structure(c(1280960887, 1280964672, 1280966285,
1280997350, 1281014882, 1281017687, 1281018106, 1281023184, 1281025529,
1281050369, 1281096942, 1281108126, 1281176749, 1281207496, 1281215744),
tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
# first count the number of observations each day
xd <- apply.daily(x, length)
# now sum the counts over a 2-day rolling window
x2d <- rollapply(xd, 2, sum)
# align times at the end of the period (if you want)
y <- align.time(x2d, n=60*60*24) # n is in seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With