I am working with some financial data that is organized as a df with a MultiIndex
that contains the ticker and the date and a column that contains the return. I am wondering whether one should convert the index to a PeriodIndex
instead of a DateTimeIndex
since returns are really over a period rather than an instant in time. Beside the philosophical argument, what practical functionality does PeriodIndex
provide that may be useful in this particular use case vs DateTimeIndex
?
There are some functions available in DateTimeIndex (such as is_month_start, is_quarter_end) which are not available in PeriodIndex. I use PeriodIndex when is not possible to have the format I need with DateTimeIndex. For example if I need a monthly frequency in the format yyyy-mm, I use the PeriodIndex.
Example: Assume that df has an index as
df.index
'2020-02-26 13:50:00', '2020-02-27 14:20:00',
'2020-02-28 11:10:00', '2020-02-29 13:50:00'],
dtype='datetime64[ns]', name='peak_time', length=1025, freq=None)
The minimum monthly data can be obtained via the following code
dfg = df.groupby([df.index.year, df.index.month]).min()
whose index is a MultiIndex
dfg.index
MultiIndex([(2017, 1),
...
(2020, 1),
(2020, 2)],
names=['peak_time', 'peak_time'])
No I convert it to a PeriodIndex:
dfg["date"] = pd.PeriodIndex (dfg.index.map(lambda x: "{0}{1:02d}".format(*x)),freq="M")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With