Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas MultiIndex DataFrame.rolling offset

Why can't I use an offset when rolling a multi-index DataFrame? For example, with:

rng = pd.date_range('2017-01-03', periods=20, freq='8D')
i = pd.MultiIndex.from_product([['A','B','C'], rng], names=['Name','Date'])
df = pd.DataFrame(np.random.randn(60), i, columns=['Vals'])

If I try grouping and rolling with an offset I get "ValueError: window must be an integer":

df['Avg'] = df.groupby(['Name'])['Vals'].rolling('30D').mean() # << Why doesn't this work?

Not that these following variants meet my needs, but note that grouping and rolling with an int works:

df['Avg'] = df.groupby(['Name'])['Vals'].rolling(4).mean()

And I can roll with an offset on a single-index subset of the DataFrame:

d = df.loc['A']
d['Avg'] = d['Vals'].rolling('30D').mean()

If it's truly impossible to do rolling with offsets on multi-index DataFrames, what would be the most efficient workaround to apply one to each level-0 index item?

like image 500
feetwet Avatar asked Feb 23 '18 21:02

feetwet


People also ask

What is Min_periods in rolling?

min_periods : Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, this will default to 1. freq : Frequency to conform the data to before computing the statistic. Specified as a frequency string or DateOffset object.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

How do you slice a MultiIndex panda?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.


1 Answers

In order to use an offset like '30D' you need a simple date index. In this case the simplest way to achieve that is to move 'Name' out of the index with reset_index(level='Name'), leaving you with only 'Date' as the index:

df['Avg'] = df.reset_index(level='Name').groupby(['Name'])['Vals'].rolling('30D').mean()
like image 85
JohnE Avatar answered Oct 12 '22 12:10

JohnE