Tricky slicing specifications on business-day datetimeindex

Tags:

pandas

I have a pandas dataframe with a business-day-based DateTimeIndex. For each month that's in the index, I also have a single 'marker' day specified.

Here's a toy version of that dataframe:

# a dataframe with business dates as the index
df = pd.DataFrame(list(range(91)), pd.date_range('2015-04-01', '2015-6-30'), columns=['foo']).resample('B').last()

# each month has an single, arbitrary marker day specified
marker_dates = [df.index[12], df.index[33], df.index[57]]

For each month in the index, I need to calculate average of the foo column in specific slice of rows in that month.

There are two different ways I need to be able to specify those slices:

1) m'th day to n'th day.

Example might be (2rd to 4th business day in that month). So april would be the average of 1 (apr2), 4 (apr3), and 5 (apr 6) = 3.33. May would be 33 (may 4), 34 (may 5), 35 (may 6) = 34. I don't consider the weekends/holidays that don't occur in the index as days.

2) m'th day before/after the marker date to the n'th day before/after the marker date.

Example might be "average of the slice from 1 day before the marker date to 1 day after the marker date in each month" Eg. In April, the marker date is 17Apr. Looking at the index, we want the average of apr16, apr17, and apr20.

For Example 1, I had an ugly solution that foreach month I would slice the rows of that month away, and then apply df_slice.iloc[m:n].mean()

Whenever I start doing iterative things with pandas, I always suspect I'm doing it wrong. So I imagine there is a cleaner, pythonic/vectorized way to make this result for all the months

For Example 2, I don't not know a good way to do this slice-averaging based on arbitrary dates across many months.

884

asked May 25 '18 11:05

user3556757

2 Answers

Use BDay() from pandas.tseries.offsets

import pandas as pd
from pandas.tseries.offsets import BDay 

M=2
N=4

start_date = pd.datetime(2015,4,1)
end_date = pd.datetime(2015,6,30)

df = pd.DataFrame(list(range(91)), pd.date_range('2015-04-01', '2015-6-30'), columns=['foo']).resample('B').last()

# for month starts
marker_dates = pd.date_range(start=start_date, end=end_date, freq='BMS')

# create IntervalIndex
bins = pd.IntervalIndex.from_tuples([ (d + (M-1)*BDay(), d + (N-1)*BDay()) for d in marker_dates ], closed='both')

df.groupby(pd.cut(df.index, bins)).mean()
#[2015-04-02, 2015-04-06]   3.333333
#[2015-05-04, 2015-05-06]  34.000000
#[2015-06-02, 2015-06-04]  63.000000


# any markers
marker_dates = [df.index[12], df.index[33], df.index[57]]

# M Bday before, and N Bday after 
bins = pd.IntervalIndex.from_tuples([ (d - M*BDay(), d + N*BDay()) for d in marker_dates ], closed='both')

df.groupby(pd.cut(df.index, bins)).mean()
#[2015-04-15, 2015-04-23]  18.428571
#[2015-05-14, 2015-05-22]  48.000000
#[2015-06-17, 2015-06-25]  81.428571

answered Sep 28 '22 08:09

jxc

The most pythonic/vectorized (pandonic?) way to do this might be to use df.rolling and df.shift to generate the window over which you'll take the average, then df.reindex to select the value at the dates you've marked.

For your example (2), this could look like:

df['foo'].rolling(3).mean().shift(-1).reindex(marker_dates)
Out[8]: 
2015-04-17    17.333333
2015-05-18    47.000000
2015-06-19    80.333333
Name: foo, dtype: float64

This could be wrapped in a small function:

def window_mean_at_indices(df, indices, begin=-1, end=1):
    return df.rolling(1+end-begin).mean().shift(-end).reindex(indices)

Helping to make it more clear how to apply this to situation (1):

month_starts = pd.date_range(df.index.min(), df.index.max(), freq='BMS')

month_starts
Out[11]: DatetimeIndex(['2015-04-01', '2015-05-01', '2015-06-01'],
                       dtype='datetime64[ns]', freq='BMS')

window_mean_at_indices(df['foo'], month_starts, begin=1, end=3)
Out[12]: 
2015-04-01     3.333333
2015-05-01    34.000000
2015-06-01    63.000000
Freq: BMS, Name: foo, dtype: float64

answered Sep 28 '22 09:09

EFT

Related questions
                            
                                Python Matplotlib - Plotting cuboids
                            
                                Using sklearn StandardScaler on only select columns
                            
                                PEP 3106 suggests slower way? Why?
                            
                                Parsing elements from list of list of strings
                            
                                Find period of a signal out of the FFT
                            
                                What is the recommended way to serialize a collection of spaCy Docs?
                            
                                python 'module' object is not callable when calling a function
                            
                                get-pip.py broken on Windows 10
                            
                                OpenCV Masking Image - error: (-215) (mtype == 0 || mtype == 1) && _mask.sameSize(*psrc1) in function cv::binary_op
                            
                                Add labels to Seaborn bivariate KDE plot
                            
                                Anaphora resolution in stanford-nlp using python
                            
                                How to initialize variables defined in tensorflow function?
                            
                                How to find an optimum number of processes in GridSearchCV( ..., n_jobs = ... )?
                            
                                NumPy: Where in the source code are `arange` and `array` functions defined?
                            
                                How to replace accents in a column of a pandas dataframe
                            
                                Django aggregate(sum error
                            
                                Python set operations - complement union of set
                            
                                Match words that don't start with a certain letter using regex
                            
                                cumsum() on multi-index pandas dataframe
                            
                                How is int.from_bytes() calculated?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tricky slicing specifications on business-day datetimeindex

Tags:

python

pandas

user3556757

People also ask

2 Answers

jxc

EFT

Recent Activity

Donate For Us