Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas timeseries resampling ending a given day

I suspect many people working on timeseries data have already come across this issue, and pandas doesn't seem to provide a straightforward solution (yet!):

Suppose:

  1. You have a timeseries of daily data with Close prices, indexed by Date (day).
  2. Today is 19JUN. Last Close data value is 18JUN.
  3. You want to resample the daily data into OHLC bars, with some given frequency (let's say M or 2M) ending 18JUN.

So for M freq, last bar would be 19MAY-18JUN, previous one 19APR-18MAY, and so on...

ts.resample('M', how='ohlc')

will do the resampling, but 'M' is 'end_of_month' period so the result will give a full month for 2014-05 and a 2-week period for 2014-06, so your last bar won't be a 'monthly bar'. That's not what we want!

With 2M frequency, given my sample timeseries, my test gives me last bar labelled as 2014-07-31 (and previous labelled as 2014-05-31), which is quite misleading since there's not data on JUL.... The supposed last 2-Month bar is again just covering the most recent 2 weeks.

The correct DatetimeIndex is easily created with:

pandas.date_range(end='2014-06-18', freq='2M', periods=300) + datetime.timedelta(days=18)

(Pandas documentation prefers to do the same thing via

pandas.date_range(end='2014-06-18', freq='2M', periods=300) + pandas.tseries.offsets.DateOffset(days=18)

but my tests shows that this method, though more 'pandaïc' is 2x slower!)

Either way we can't apply the right DatetimeIndex to ts.resample().

It seems that pandas dev team (Date ranges in Pandas) is aware of this issue, but in the meantime, how could you solve it to get OHLC with rolling frequency anchored on the last day in the timeseries?

like image 260
comte Avatar asked Jun 19 '14 10:06

comte


1 Answers

This is basically hacked together from copy/paste, and I'm sure fails on some cases - but below is some starting code for a custom Offset that is anchored to a particular day in a month.

from pandas.tseries.offsets import (as_datetime, as_timestamp, apply_nat, 
                               DateOffset, relativedelta, datetime)
class MonthAnchor(DateOffset):
    """DateOffset Anchored to day in month

        Arguments:
        day_anchor: day to be anchored to
    """

    def __init__(self, n=1, **kwds):
        super(MonthAnchor, self).__init__(n)

        self.kwds = kwds
        self._dayanchor = self.kwds['day_anchor']

    @apply_nat
    def apply(self, other):
        n = self.n

        if other.day > self._dayanchor and n <= 0:  # then roll forward if n<=0
            n += 1
        elif other.day < self._dayanchor and n > 0:
            n -= 1

        other = as_datetime(other) + relativedelta(months=n)
        other = datetime(other.year, other.month, self._dayanchor)
        return as_timestamp(other)

    def onOffset(self, dt):
        return dt.day == self._dayanchor

    _prefix = ''

Example usage:

In [28]: df = pd.DataFrame(data=np.linspace(50, 100, 200), index=pd.date_range(end='2014-06-18', periods=200), columns=['value'])

In [29]: df.head()
Out[29]: 
                value
2013-12-01  50.000000
2013-12-02  50.251256
2013-12-03  50.502513
2013-12-04  50.753769
2013-12-05  51.005025


In [61]: month_offset = MonthAnchor(day_anchor = df.index[-1].day + 1)

In [62]: df.resample(month_offset, how='ohlc')
Out[62]: 
                value                                   
                 open        high        low       close
2013-11-19  50.000000   54.271357  50.000000   54.271357
2013-12-19  54.522613   62.060302  54.522613   62.060302
2014-01-19  62.311558   69.849246  62.311558   69.849246
2014-02-19  70.100503   76.884422  70.100503   76.884422
2014-03-19  77.135678   84.673367  77.135678   84.673367
2014-04-19  84.924623   92.211055  84.924623   92.211055
2014-05-19  92.462312  100.000000  92.462312  100.000000
like image 54
chrisb Avatar answered Oct 17 '22 01:10

chrisb