Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can pandas.DatetimeIndex remember whether it is closed?

I have a pandas.DatetimeIndex for an interval ['2018-01-01', '2018-01-04') (start included, end excluded) and freq=1D:

>>> index = pd.DatetimeIndex(start='2018-01-01',
                             end='2018-01-04',
                             freq='1D',
                             closed='left')
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'],
              dtype='datetime64[ns]',
              freq='D')

How can I obtain the correct open end='2018-01-04' attribute again? I need it for a DB query with timestamp ranges.

  1. There is no index.end
  2. index[-1] returns '2018-01-03'
  3. index[-1] + index.freq works in this case but is wrong for freq='2D'
like image 255
eumiro Avatar asked Oct 09 '18 13:10

eumiro


1 Answers

There's no way because this information is lost after constructing the object. At creation time, the interval is unfolded into the resulting sequence:

pandas/core/indexes/datetimes.py:

class DatetimeIndex(<...>):

    <...>

    @classmethod
    def _generate(cls, start, end, periods, name, freq,
                  tz=None, normalize=False, ambiguous='raise', closed=None):
        <...>

                index = tools.to_datetime(np.linspace(start.value,
                                                      end.value, periods),
                                          utc=True)
                <...>

        if not left_closed and len(index) and index[0] == start:
            index = index[1:]
        if not right_closed and len(index) and index[-1] == end:
            index = index[:-1]
        index = cls._simple_new(index, name=name, freq=freq, tz=tz)
        return index

Neither is closed information saved anywhere, so you can't even infer it from the first/last point and step.


You can subclass DatetimeIndex and save this information. Note that it's an immutable type, so you need to override __new__ instead of __init__:

import inspect, collections
class SiDatetimeIndex(pd.DatetimeIndex):

    _Interval = collections.namedtuple('Interval',
            ('start','end','freq','closed'))
    #add 'interval' to dir(): DatetimeIndex inherits pandas.core.accessor.DirNamesMixin
    _accessors = pd.DatetimeIndex._accessors | frozenset(('interval',))

    def __new__(cls, *args, **kwargs):
        base_new = super(SiDatetimeIndex,cls).__new__
        callargs = inspect.getcallargs(base_new,cls,*args,**kwargs)
        result = base_new(**callargs)
        result.interval = cls._Interval._make(callargs[a] for a in cls._Interval._fields)
        return result


In [31]: index = SiDatetimeIndex(start='2018-01-01',
...:                              end='2018-01-04',
...:                              freq='1D',
...:                              closed='left')

In [38]: index.interval
Out[38]: Interval(start='2018-01-01', end='2018-01-04', freq='1D', closed='left')

Don't expect though that all the pandas methods (including the inherited ones in your class) will now magically start creating your overridden class. For that, you'll need to replace live references to the base class in loaded pandas modules that those methods use. Alternatively, you can replace just the original's __new__ -- then no need to replace references.

like image 199
ivan_pozdeev Avatar answered Oct 24 '22 09:10

ivan_pozdeev