Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Irregular, non-contiguous Periods in Pandas

I need to represent a sequence of events. These events are a little unusual in that they are:

  • non-contiguous
  • non-overlapping
  • irregular duration

For example:

  • 1200 - 1203
  • 1210 - 1225
  • 1304 - 1502

I would like to represent these events using Pandas.PeriodIndex but I can't figure out how to create Period objects with irregular durations.

I have two questions:

  1. Is there a way to create Period objects with irregular durations using existing Pandas functionality?
  2. If not, could you suggest how to modify Pandas in order to provide irregular duration Period objects? (this comment suggests that it might be possible "using custom DateOffset classes with appropriately crafted onOffset, rollforward, rollback, and apply methods")

Notes

  1. The docstring for Period suggests that it is possible to specify arbitrary durations like 5T for "5 minutes". I believe this docstring is incorrect. Running pd.Period('2013-01-01', freq='5T') produces an exception ValueError: Only mult == 1 supported. I have reported this issue.
  2. The "time stamps vs time spans" section in the Pandas documentation states "For regular time spans, pandas uses Period objects for scalar values and PeriodIndex for sequences of spans. Better support for irregular intervals with arbitrary start and end points are forth-coming in future releases." (my emphasis)

Update 1

Building a Period with a custom duration looks pretty straightforward. BUT I think the main stumbling block will be persuading PeriodIndex to accept Periods with different freqs. e.g.:

In [93]: pd.PeriodIndex([pd.Period('2000', freq='D'), 
                         pd.Period('2001', freq='T')])

ValueError: 2001-01-01 00:00 is wrong freq

It looks like a central assumption in PeriodIndex is that every Period has the same freq.

like image 791
Jack Kelly Avatar asked Aug 28 '13 11:08

Jack Kelly


1 Answers

A possible solution, depending on the application, is to bin your data by creating a PeriodIndex that has a period equal to the smallest unit of time resolution that you need in order to handle your data and then divide the data amongst the bins for each event, leaving the remaining bins null.

like image 187
storn Avatar answered Sep 24 '22 09:09

storn