I have a pandas dataframe with a business-day-based DateTimeIndex. For each month that's in the index, I also have a single 'marker' day specified.
Here's a toy version of that dataframe:
# a dataframe with business dates as the index
df = pd.DataFrame(list(range(91)), pd.date_range('2015-04-01', '2015-6-30'), columns=['foo']).resample('B').last()
# each month has an single, arbitrary marker day specified
marker_dates = [df.index[12], df.index[33], df.index[57]]
For each month in the index, I need to calculate average of the foo
column in specific slice of rows in that month.
There are two different ways I need to be able to specify those slices:
1) m'th day to n'th day.
Example might be (2rd to 4th business day in that month). So april would be the average of 1 (apr2), 4 (apr3), and 5 (apr 6) = 3.33. May would be 33 (may 4), 34 (may 5), 35 (may 6) = 34. I don't consider the weekends/holidays that don't occur in the index as days.
2) m'th day before/after the marker date to the n'th day before/after the marker date.
Example might be "average of the slice from 1 day before the marker date to 1 day after the marker date in each month" Eg. In April, the marker date is 17Apr. Looking at the index, we want the average of apr16, apr17, and apr20.
For Example 1, I had an ugly solution that foreach month I would slice the rows of that month away, and then apply df_slice.iloc[m:n].mean()
Whenever I start doing iterative things with pandas, I always suspect I'm doing it wrong. So I imagine there is a cleaner, pythonic/vectorized way to make this result for all the months
For Example 2, I don't not know a good way to do this slice-averaging based on arbitrary dates across many months.
Using pandas datetime properties. Initially, the values in datetime are character strings and do not provide any datetime operations (e.g. extract the year, day of the week,…). By applying the to_datetime function, pandas interprets the strings and convert these to datetime (i.e. datetime64[ns, UTC] ) objects.
Slice. A set of a data frame's rows you create by using a method from a data frame instance or another data frame slice.
Use BDay() from pandas.tseries.offsets
import pandas as pd
from pandas.tseries.offsets import BDay
M=2
N=4
start_date = pd.datetime(2015,4,1)
end_date = pd.datetime(2015,6,30)
df = pd.DataFrame(list(range(91)), pd.date_range('2015-04-01', '2015-6-30'), columns=['foo']).resample('B').last()
# for month starts
marker_dates = pd.date_range(start=start_date, end=end_date, freq='BMS')
# create IntervalIndex
bins = pd.IntervalIndex.from_tuples([ (d + (M-1)*BDay(), d + (N-1)*BDay()) for d in marker_dates ], closed='both')
df.groupby(pd.cut(df.index, bins)).mean()
#[2015-04-02, 2015-04-06] 3.333333
#[2015-05-04, 2015-05-06] 34.000000
#[2015-06-02, 2015-06-04] 63.000000
# any markers
marker_dates = [df.index[12], df.index[33], df.index[57]]
# M Bday before, and N Bday after
bins = pd.IntervalIndex.from_tuples([ (d - M*BDay(), d + N*BDay()) for d in marker_dates ], closed='both')
df.groupby(pd.cut(df.index, bins)).mean()
#[2015-04-15, 2015-04-23] 18.428571
#[2015-05-14, 2015-05-22] 48.000000
#[2015-06-17, 2015-06-25] 81.428571
The most pythonic/vectorized (pandonic?) way to do this might be to use df.rolling and df.shift to generate the window over which you'll take the average, then df.reindex to select the value at the dates you've marked.
For your example (2), this could look like:
df['foo'].rolling(3).mean().shift(-1).reindex(marker_dates)
Out[8]:
2015-04-17 17.333333
2015-05-18 47.000000
2015-06-19 80.333333
Name: foo, dtype: float64
This could be wrapped in a small function:
def window_mean_at_indices(df, indices, begin=-1, end=1):
return df.rolling(1+end-begin).mean().shift(-end).reindex(indices)
Helping to make it more clear how to apply this to situation (1):
month_starts = pd.date_range(df.index.min(), df.index.max(), freq='BMS')
month_starts
Out[11]: DatetimeIndex(['2015-04-01', '2015-05-01', '2015-06-01'],
dtype='datetime64[ns]', freq='BMS')
window_mean_at_indices(df['foo'], month_starts, begin=1, end=3)
Out[12]:
2015-04-01 3.333333
2015-05-01 34.000000
2015-06-01 63.000000
Freq: BMS, Name: foo, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With