Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I calculate a rolling idxmax

consider the pd.Series s

import pandas as pd
import numpy as np

np.random.seed([3,1415])
s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij'))
s

a    0
b    2
c    7
d    3
e    8
f    7
g    0
h    6
i    8
j    6
dtype: int64

I want to get the index for the max value for the rolling window of 3

s.rolling(3).max()

a    NaN
b    NaN
c    7.0
d    7.0
e    8.0
f    8.0
g    8.0
h    7.0
i    8.0
j    8.0
dtype: float64

What I want is

a    None
b    None
c       c
d       c
e       e
f       e
g       e
h       f
i       i
j       i
dtype: object

What I've done

s.rolling(3).apply(np.argmax)

a    NaN
b    NaN
c    2.0
d    1.0
e    2.0
f    1.0
g    0.0
h    0.0
i    2.0
j    1.0
dtype: float64

which is obviously not what I want

like image 897
piRSquared Avatar asked Oct 18 '16 06:10

piRSquared


People also ask

What is Min_periods in rolling?

The min_periods argument specifies the minimum number of observations in the current window required to generate a rolling value; otherwise, the result is NaN .

What is a rolling function?

rolling() function is a very useful function. It Provides rolling window calculations over the underlying data in the given Series object. Syntax: Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None) Parameter : window : Size of the moving window.

What is DF rolling?

A rolling mean is simply the mean of a certain number of previous periods in a time series. To calculate the rolling mean for one or more columns in a pandas DataFrame, we can use the following syntax: df['column_name'].

How does rolling in pandas work?

rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.


2 Answers

There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn't know about the index. Moreover, the rolling functions must return a float result, so they can't directly return the index values if they're not floats.

Here is one approach:

>>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)]
Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')

The idea is to take the argmax values and align them with the series by adding a value indicating how far along in the series we are. (That is, for the first argmax value we add zero, because it is giving us the index into a subsequence starting at index 0 in the original series; for the second argmax value we add one, because it is giving us the index into a subsequence starting at index 1 in the original series; etc.)

This gives the correct results, but doesn't include the two "None" values at the beginning; you'd have to add those back manually if you wanted them.

There is an open pandas issue to add rolling idxmax.

like image 72
BrenBarn Avatar answered Oct 17 '22 18:10

BrenBarn


I used a generator

def idxmax(s, w):
    i = 0
    while i + w <= len(s):
        yield(s.iloc[i:i+w].idxmax())
        i += 1

pd.Series(idxmax(s, 3), s.index[2:])

c    c
d    c
e    e
f    e
g    e
h    f
i    i
j    i
dtype: object
like image 4
piRSquared Avatar answered Oct 17 '22 19:10

piRSquared